On this document, I’ve included the results from the initial exploration into the different model outputs, ranking of covariate influence, performance metrics, and prediction maps. This first set of models only includes extracted covariate data at a daily temporal resolution, but I am also considering exploring models that include covariate data at a seasonal or annual temporal resolution. The pseudo absences used in these models were generated using background sampling approaches. Lastly, hyperparameters were tuned using the caret package and across all models, a learning rate of 0.05 and tree complexity of 3 resulted in the highest accuracy. Lastly, the ‘pred_var’ predictor is a random set of numbers that will be used to identify which predictor variables should be included in the final model, and which are not informative.
The hypotheses I would like to test with these models are as follows:
H1: The AGI model will perform better than the dissolved oxygen and null model, and the dissolved oxygen model will perform better than the null model.
study objective being met: Which model performs the best and presents the best predictions (i.e., best predictive performance scores, most ecologically realistic suitability maps)?
H2: The inclusion of dissolved oxygen at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the dissolved oxygen model considering surface values alone.
study objective being met: How does dissolved oxygen at different depths influence habitat suitability predictions relative to oxygen at the surface?
H3: The inclusion of the AGI at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the AGI model considering surface values alone.
study objective being met: How does the aerobic growth index (AGI; environmental oxygen supply:theoretical oxygen demand) at different depths influence habitat suitability predictions relative to the aerobic growth index at the surface?
H4: There will be important relationships between dissolved oxygen/the AGI and latitude/distance to coast.
study objective being met: Are there any important relationships between dissolved oxygen or AGI at the surface or at depth and latitude or distance to the coast?
H5: The null model will predict higher habitat suitability in areas or during seasons or periods (upwelling or La Niña) with lower dissolved oxygen through the water column relative to the dissolved oxygen and AGI models.
study objective being met: How do the habitat suitability maps differ between the models? How do these variations compare for different points in time?
Base models
These three models represent three different options for the base model and either include spatial predictors, a tag ID predictor, both, or neither. These models were developed by splitting the data set into 75/25 train/test, and thus that is the model evaluation approach used here. However, once a model is selected, I can run additional evaluation metrics (i.e., LOO, k-fold). I can also complete these now depending on when that is typically performed.
explore_brt (mod_file_path = brt_outputs[7 ],
test_data = base_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862741
Residual.Deviance 0.2948092
Correlation 0.9249630
AUC 0.9922000
Per.Expl 78.7336988
cvDeviance 0.5909966
cvCorrelation 0.8025147
cvAUC 0.9464300
cvPer.Expl 57.3679835
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 37.726838
temp_mean 23.806676
sal_mean 7.021355
chl_mean 5.980357
ssh_mean 5.413233
uostr_mean 5.244057
vostr_mean 3.838429
bathy_sd 2.871536
mld_mean 2.581925
uo_mean 2.392186
vo_mean 1.868975
pred_var 1.254433
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 10 bathy_mean 2 temp_mean 835.60
2 10 bathy_mean 8 ssh_mean 650.18
3 8 ssh_mean 2 temp_mean 556.11
4 10 bathy_mean 3 sal_mean 496.83
5 10 bathy_mean 4 uo_mean 406.37
6 3 sal_mean 2 temp_mean 343.56
7 8 ssh_mean 1 chl_mean 337.30
[1] "External percent deviance explained"
[1] 0.7242147
[1] "TPR"
[1] 0.7393556
[1] "TSS"
[1] 0.8700281
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4250 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2312832 0.8884673 0.9794331 0.9980034 0.7242147 0.787337
explore_brt (mod_file_path = brt_outputs[8 ],
test_data = base_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38627408
Residual.Deviance 0.09736975
Correlation 0.98457770
AUC 0.99990000
Per.Expl 92.97615463
cvDeviance 0.34392232
cvCorrelation 0.89862302
cvAUC 0.97914000
cvPer.Expl 75.19088555
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 32.0639964
tag 24.5885909
temp_mean 18.9524289
ssh_mean 4.9851264
sal_mean 4.0438501
uostr_mean 3.9759944
chl_mean 3.8548424
vostr_mean 2.6340512
bathy_sd 1.2289919
uo_mean 1.1341238
vo_mean 1.1196672
mld_mean 0.8787739
pred_var 0.5395627
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 4 sal_mean 1 tag 1883.53
2 11 bathy_mean 1 tag 770.37
3 2 chl_mean 1 tag 714.07
4 3 temp_mean 1 tag 626.86
5 9 ssh_mean 1 tag 604.60
6 8 vostr_mean 1 tag 409.93
7 7 vo_mean 1 tag 382.85
8 6 uostr_mean 1 tag 370.45
[1] "External percent deviance explained"
[1] 0.8761131
[1] "TPR"
[1] 0.761429
[1] "TSS"
[1] 0.9611333
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6500 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1413529 0.9600136 0.9941374 0.9928895 0.8761131 0.9297615
explore_brt (mod_file_path = brt_outputs[9 ],
test_data = base_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38627408
Residual.Deviance 0.08949741
Correlation 0.98503325
AUC 0.99990000
Per.Expl 93.54403230
cvDeviance 0.29985176
cvCorrelation 0.91378722
cvAUC 0.98270000
cvPer.Expl 78.36995123
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 52.6117764
tag 20.0388250
lat 8.8678338
temp_mean 4.2491755
bathy_mean 3.6159044
chl_mean 2.7994230
sal_mean 2.3492197
ssh_mean 1.1726958
vostr_mean 1.1624008
vo_mean 0.6291947
uo_mean 0.6153267
bathy_sd 0.5647726
uostr_mean 0.5065045
mld_mean 0.4821748
pred_var 0.3347723
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 737.42
2 5 sal_mean 1 tag 551.43
3 12 bathy_mean 1 tag 502.86
4 3 chl_mean 1 tag 464.36
5 14 dist_coast 1 tag 419.05
6 9 vostr_mean 1 tag 274.27
7 8 vo_mean 1 tag 270.06
8 10 ssh_mean 1 tag 227.94
9 4 temp_mean 1 tag 195.29
10 13 bathy_sd 1 tag 186.34
11 7 uostr_mean 1 tag 171.34
[1] "External percent deviance explained"
[1] 0.8873391
[1] "TPR"
[1] 0.7703599
[1] "TSS"
[1] 0.9644792
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5350 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1354571 0.9632775 0.9952196 0.9908541 0.8873391 0.9354403
DO models
I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include DO and the other environmental predictor variables as longer time scales (seasonal/annual).
0m, no spatial, yes tag 0m, yes spatial, yes tag 0m & 60m, no spatial, yes tag 0m & 250m, no spatial, yes tag 0m, 60m, & 250m, no spatial, yes tag 0m, 60m, & 250m, yes spatial, yes tag
explore_brt (mod_file_path = brt_outputs[14 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38629281
Residual.Deviance 0.08039145
Correlation 0.98792844
AUC 1.00000000
Per.Expl 94.20097610
cvDeviance 0.30084003
cvCorrelation 0.91332970
cvAUC 0.98319000
cvPer.Expl 78.29895482
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 32.6469598
o2_mean_0m 26.9115748
tag 20.0046262
temp_mean 4.4981160
chl_mean 3.6729261
ssh_mean 2.6241221
uostr_mean 2.1630542
sal_mean 2.0577804
vostr_mean 1.8988935
mld_mean 0.9525212
uo_mean 0.7373759
bathy_sd 0.7185793
vo_mean 0.7077598
pred_var 0.4057107
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 5 sal_mean 1 tag 1251.29
2 4 temp_mean 2 o2_mean_0m 856.53
3 2 o2_mean_0m 1 tag 838.02
4 12 bathy_mean 1 tag 811.97
5 4 temp_mean 1 tag 452.47
6 3 chl_mean 1 tag 413.64
7 13 bathy_sd 1 tag 363.30
8 8 vo_mean 1 tag 348.54
9 7 uostr_mean 1 tag 340.42
10 9 vostr_mean 1 tag 299.61
[1] "External percent deviance explained"
[1] 0.9030827
[1] "TPR"
[1] 0.7699214
[1] "TSS"
[1] 0.9727243
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6000 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1204197 0.9712985 0.9963055 0.9957087 0.9030827 0.9420098
explore_brt (mod_file_path = brt_outputs[15 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38629281
Residual.Deviance 0.06074708
Correlation 0.99206350
AUC 1.00000000
Per.Expl 95.61801965
cvDeviance 0.26396205
cvCorrelation 0.92768023
cvAUC 0.98584000
cvPer.Expl 80.95914152
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 51.1425792
tag 18.4444753
o2_mean_0m 10.5833390
lat 7.0053558
bathy_mean 3.1295802
chl_mean 2.2557239
sal_mean 1.5213093
temp_mean 1.2786828
vostr_mean 0.9690416
ssh_mean 0.9508931
mld_mean 0.5242121
vo_mean 0.5180216
uo_mean 0.4790866
bathy_sd 0.4771174
uostr_mean 0.4558059
pred_var 0.2647762
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 o2_mean_0m 1 tag 779.87
2 2 lat 1 tag 692.69
3 5 temp_mean 3 o2_mean_0m 636.47
4 15 dist_coast 1 tag 466.74
5 6 sal_mean 1 tag 426.43
6 4 chl_mean 1 tag 421.31
7 13 bathy_mean 1 tag 420.13
8 14 bathy_sd 1 tag 344.47
9 9 vo_mean 1 tag 303.38
10 10 vostr_mean 1 tag 230.16
11 5 temp_mean 1 tag 222.22
12 8 uostr_mean 1 tag 208.55
13 15 dist_coast 3 o2_mean_0m 174.48
[1] "External percent deviance explained"
[1] 0.9209995
[1] "TPR"
[1] 0.7838581
[1] "TSS"
[1] 0.9801804
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6000 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1055065 0.9780055 0.9968954 0.9930044 0.9209995 0.9561802
explore_brt (mod_file_path = brt_outputs[13 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38629281
Residual.Deviance 0.07118474
Correlation 0.98998036
AUC 1.00000000
Per.Expl 94.86510106
cvDeviance 0.28488053
cvCorrelation 0.92010488
cvAUC 0.98419000
cvPer.Expl 79.45019060
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 29.0714692
o2_mean_0m 26.9541769
tag 18.8753315
o2_mean_60m 10.1542238
chl_mean 3.3838023
ssh_mean 2.9041357
temp_mean 1.9892805
sal_mean 1.6667460
vostr_mean 1.1791073
uostr_mean 1.0524747
mld_mean 0.7605733
uo_mean 0.6030973
vo_mean 0.5369837
bathy_sd 0.5062655
pred_var 0.3623321
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 o2_mean_0m 1 tag 914.39
2 5 sal_mean 1 tag 778.04
3 12 bathy_mean 1 tag 774.90
4 4 temp_mean 2 o2_mean_0m 449.37
5 3 chl_mean 1 tag 439.91
6 13 bathy_sd 1 tag 427.34
7 4 temp_mean 1 tag 381.87
8 14 o2_mean_60m 1 tag 355.36
9 9 vostr_mean 1 tag 293.42
10 8 vo_mean 1 tag 292.69
11 10 ssh_mean 1 tag 259.83
[1] "External percent deviance explained"
[1] 0.9097515
[1] "TPR"
[1] 0.7719523
[1] "TSS"
[1] 0.9762715
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5950 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1144131 0.9740819 0.9964178 0.9952997 0.9097515 0.948651
explore_brt (mod_file_path = brt_outputs[10 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38629281
Residual.Deviance 0.06836473
Correlation 0.99037738
AUC 1.00000000
Per.Expl 95.06852165
cvDeviance 0.28369551
cvCorrelation 0.92037443
cvAUC 0.98436000
cvPer.Expl 79.53567176
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 27.2629760
bathy_mean 25.2498534
tag 18.4421903
o2_mean_250m 16.5248776
chl_mean 2.2649308
temp_mean 1.9626402
sal_mean 1.7897130
ssh_mean 1.5479838
uostr_mean 1.1276963
vostr_mean 1.0966645
bathy_sd 0.7244073
vo_mean 0.5873495
mld_mean 0.5347793
uo_mean 0.5208155
pred_var 0.3631225
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 5 sal_mean 1 tag 1081.33
2 2 o2_mean_0m 1 tag 831.07
3 4 temp_mean 2 o2_mean_0m 634.92
4 14 o2_mean_250m 1 tag 580.53
5 3 chl_mean 1 tag 508.53
6 12 bathy_mean 1 tag 461.58
7 9 vostr_mean 1 tag 296.67
8 4 temp_mean 1 tag 295.22
9 8 vo_mean 1 tag 272.25
10 14 o2_mean_250m 2 o2_mean_0m 254.15
11 13 bathy_sd 1 tag 249.56
[1] "External percent deviance explained"
[1] 0.9100334
[1] "TPR"
[1] 0.7785341
[1] "TSS"
[1] 0.9772948
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5950 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1153394 0.9735764 0.9963606 0.9937007 0.9100334 0.9506852
explore_brt (mod_file_path = brt_outputs[11 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38629281
Residual.Deviance 0.06749084
Correlation 0.99030698
AUC 1.00000000
Per.Expl 95.13155977
cvDeviance 0.27553664
cvCorrelation 0.92358978
cvAUC 0.98486000
cvPer.Expl 80.12421074
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 27.1899223
bathy_mean 24.5062176
tag 17.3552811
o2_mean_250m 14.4641723
o2_mean_60m 5.7991520
ssh_mean 1.9059494
chl_mean 1.8501917
sal_mean 1.4611193
temp_mean 1.4363430
uostr_mean 0.8686807
vostr_mean 0.8420687
bathy_sd 0.5476465
vo_mean 0.5305118
mld_mean 0.5035008
uo_mean 0.4481542
pred_var 0.2910886
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 o2_mean_0m 1 tag 817.46
2 5 sal_mean 1 tag 587.30
3 4 temp_mean 2 o2_mean_0m 558.25
4 15 o2_mean_250m 1 tag 478.71
5 3 chl_mean 1 tag 429.20
6 12 bathy_mean 1 tag 410.22
7 4 temp_mean 1 tag 305.22
8 14 o2_mean_60m 1 tag 300.61
9 9 vostr_mean 1 tag 270.41
10 7 uostr_mean 1 tag 228.06
11 13 bathy_sd 1 tag 205.43
12 15 o2_mean_250m 2 o2_mean_0m 203.41
13 10 ssh_mean 1 tag 189.41
[1] "External percent deviance explained"
[1] 0.9131763
[1] "TPR"
[1] 0.7787795
[1] "TSS"
[1] 0.9783368
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5600 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1127805 0.9747411 0.9964709 0.9948774 0.9131763 0.9513156
explore_brt (mod_file_path = brt_outputs[12 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38629281
Residual.Deviance 0.06674273
Correlation 0.99036056
AUC 1.00000000
Per.Expl 95.18552429
cvDeviance 0.25849355
cvCorrelation 0.92868348
cvAUC 0.98632000
cvPer.Expl 81.35361122
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 50.7979905
tag 17.4187334
o2_mean_0m 11.1532694
o2_mean_250m 4.8956535
lat 4.3567015
o2_mean_60m 2.7427545
chl_mean 1.7488786
bathy_mean 1.2868608
sal_mean 1.0812867
temp_mean 0.9791718
vostr_mean 0.6174175
ssh_mean 0.6127769
uostr_mean 0.5321097
bathy_sd 0.4024418
vo_mean 0.3965878
mld_mean 0.3741117
uo_mean 0.3487283
pred_var 0.2545257
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 o2_mean_0m 1 tag 662.12
2 5 temp_mean 3 o2_mean_0m 481.90
3 2 lat 1 tag 385.52
4 6 sal_mean 1 tag 348.70
5 4 chl_mean 1 tag 336.40
6 14 bathy_sd 1 tag 330.96
7 17 o2_mean_250m 1 tag 293.11
8 13 bathy_mean 1 tag 276.05
9 15 dist_coast 1 tag 246.21
10 10 vostr_mean 1 tag 223.18
11 16 o2_mean_60m 1 tag 213.22
12 9 vo_mean 1 tag 206.61
13 16 o2_mean_60m 5 temp_mean 175.27
14 5 temp_mean 1 tag 144.27
15 15 dist_coast 3 o2_mean_0m 140.51
16 11 ssh_mean 1 tag 109.39
[1] "External percent deviance explained"
[1] 0.9161662
[1] "TPR"
[1] 0.7804632
[1] "TSS"
[1] 0.9780841
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5200 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1109025 0.9756128 0.9968319 0.9939269 0.9161662 0.9518552
AGI models
I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include AGI and the other environmental predictor variables as longer time scales (seasonal/annual).
0m, no spatial, yes tag 0m, yes spatial, yes tag 0m & 60m, no spatial, yes tag 0m & 250m, no spatial, yes tag 0m, 60m, & 250m, no spatial, yes tag 0m, 60m, & 250m, yes spatial, yes tag
explore_brt (mod_file_path = brt_outputs[5 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38628958
Residual.Deviance 0.08532138
Correlation 0.98702006
AUC 1.00000000
Per.Expl 93.84534182
cvDeviance 0.31310146
cvCorrelation 0.91077834
cvAUC 0.98123000
cvPer.Expl 77.41442573
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 31.3849938
tag 22.9924922
temp_mean 19.0362797
ssh_mean 5.1405835
uostr_mean 4.5173002
AGI_0m 3.9279250
sal_mean 3.5882270
chl_mean 2.8743197
vostr_mean 2.5250985
bathy_sd 1.2576208
uo_mean 0.9020629
vo_mean 0.8232203
mld_mean 0.6439118
pred_var 0.3859644
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1816.06
2 4 sal_mean 1 tag 1301.25
3 3 temp_mean 1 tag 851.13
4 11 bathy_mean 1 tag 729.91
5 2 chl_mean 1 tag 428.42
6 7 vo_mean 1 tag 417.68
7 9 ssh_mean 1 tag 328.79
8 8 vostr_mean 1 tag 326.69
9 11 bathy_mean 3 temp_mean 318.21
10 13 AGI_0m 1 tag 295.11
[1] "External percent deviance explained"
[1] 0.900391
[1] "TPR"
[1] 0.7649387
[1] "TSS"
[1] 0.969663
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6150 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.126741 0.9682712 0.9975976 0.9939181 0.900391 0.9384534
explore_brt (mod_file_path = brt_outputs[6 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38628958
Residual.Deviance 0.07272435
Correlation 0.98931402
AUC 1.00000000
Per.Expl 94.75402902
cvDeviance 0.27470458
cvCorrelation 0.92333512
cvAUC 0.98468000
cvPer.Expl 80.18418485
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 52.4969130
tag 19.1224933
lat 8.8661139
temp_mean 4.4530414
bathy_mean 3.4033437
AGI_0m 2.7486874
chl_mean 2.2110726
sal_mean 1.8920127
ssh_mean 1.0292439
vostr_mean 0.8247087
bathy_sd 0.5853612
vo_mean 0.5707580
uo_mean 0.5420383
uostr_mean 0.5176506
mld_mean 0.4773298
pred_var 0.2592316
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 711.46
2 5 sal_mean 1 tag 430.54
3 13 bathy_sd 1 tag 389.47
4 3 chl_mean 1 tag 388.51
5 12 bathy_mean 1 tag 385.22
6 15 dist_coast 1 tag 319.70
7 14 AGI_0m 1 tag 299.95
8 14 AGI_0m 4 temp_mean 291.55
9 4 temp_mean 1 tag 271.29
10 8 vo_mean 1 tag 270.02
11 9 vostr_mean 1 tag 263.25
12 14 AGI_0m 2 lat 173.01
13 10 ssh_mean 1 tag 168.72
[1] "External percent deviance explained"
[1] 0.9135304
[1] "TPR"
[1] 0.7741413
[1] "TSS"
[1] 0.9745233
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5600 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.116643 0.9730895 0.9978667 0.9952253 0.9135304 0.9475403
explore_brt (mod_file_path = brt_outputs[4 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38628958
Residual.Deviance 0.07558239
Correlation 0.98942417
AUC 1.00000000
Per.Expl 94.54786401
cvDeviance 0.29847916
cvCorrelation 0.91616548
cvAUC 0.98261000
cvPer.Expl 78.46920527
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 31.1481369
tag 22.3947478
temp_mean 19.6320724
AGI_0m 4.5763214
uostr_mean 4.0458198
AGI_60m 3.9505830
ssh_mean 3.2901697
sal_mean 3.2745439
vostr_mean 2.2604520
chl_mean 2.1686907
bathy_sd 0.8962724
uo_mean 0.7362601
vo_mean 0.7075609
mld_mean 0.5826191
pred_var 0.3357501
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1887.49
2 4 sal_mean 1 tag 1176.99
3 3 temp_mean 1 tag 753.28
4 11 bathy_mean 1 tag 637.60
5 14 AGI_60m 1 tag 576.97
6 8 vostr_mean 1 tag 433.15
7 2 chl_mean 1 tag 415.56
8 12 bathy_sd 1 tag 400.28
9 11 bathy_mean 3 temp_mean 363.16
10 7 vo_mean 1 tag 338.63
11 9 ssh_mean 1 tag 259.86
[1] "External percent deviance explained"
[1] 0.9087836
[1] "TPR"
[1] 0.7648856
[1] "TSS"
[1] 0.9734918
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6200 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1193707 0.9718873 0.9977224 0.9954992 0.9087836 0.9454786
explore_brt (mod_file_path = brt_outputs[1 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38628958
Residual.Deviance 0.09622781
Correlation 0.98319352
AUC 0.99980000
Per.Expl 93.05860674
cvDeviance 0.30428517
cvCorrelation 0.91335265
cvAUC 0.98178000
cvPer.Expl 78.05038891
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 26.2607017
tag 20.7337462
temp_mean 17.6619552
AGI_250m 12.8869113
uostr_mean 5.3151550
ssh_mean 4.0761838
AGI_0m 3.6152744
sal_mean 3.0128057
chl_mean 1.9236281
vostr_mean 1.2761366
bathy_sd 1.1067112
vo_mean 0.6687755
uo_mean 0.6510241
mld_mean 0.5242352
pred_var 0.2867560
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1354.07
2 4 sal_mean 1 tag 1099.13
3 3 temp_mean 1 tag 650.85
4 14 AGI_250m 1 tag 423.38
5 12 bathy_sd 1 tag 385.64
6 2 chl_mean 1 tag 371.29
7 11 bathy_mean 1 tag 357.84
8 9 ssh_mean 1 tag 292.44
9 7 vo_mean 1 tag 283.14
10 8 vostr_mean 1 tag 229.35
11 13 AGI_0m 1 tag 216.49
[1] "External percent deviance explained"
[1] 0.8891345
[1] "TPR"
[1] 0.7652902
[1] "TSS"
[1] 0.9632257
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4900 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1356224 0.9633649 0.996446 0.9939279 0.8891345 0.9305861
explore_brt (mod_file_path = brt_outputs[2 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38628958
Residual.Deviance 0.08163443
Correlation 0.98713893
AUC 1.00000000
Per.Expl 94.11130005
cvDeviance 0.29071385
cvCorrelation 0.91828031
cvAUC 0.98297000
cvPer.Expl 79.02935599
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 25.7475201
tag 20.6577555
temp_mean 18.3535453
AGI_250m 12.2429766
uostr_mean 4.4230811
ssh_mean 4.1802787
AGI_0m 4.0304067
sal_mean 2.6720505
AGI_60m 1.6636685
chl_mean 1.5954124
vostr_mean 1.2580929
bathy_sd 1.1907574
vo_mean 0.7000913
uo_mean 0.5201387
mld_mean 0.4689463
pred_var 0.2952780
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1653.43
2 4 sal_mean 1 tag 1094.62
3 3 temp_mean 1 tag 585.48
4 14 AGI_60m 1 tag 369.06
5 8 vostr_mean 1 tag 336.94
6 11 bathy_mean 1 tag 335.06
7 15 AGI_250m 1 tag 309.88
8 12 bathy_sd 1 tag 303.52
9 2 chl_mean 1 tag 295.26
10 9 ssh_mean 1 tag 239.61
11 7 vo_mean 1 tag 210.65
12 13 AGI_0m 1 tag 163.15
13 5 uo_mean 1 tag 142.51
[1] "External percent deviance explained"
[1] 0.9035623
[1] "TPR"
[1] 0.766589
[1] "TSS"
[1] 0.9722347
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5300 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1243517 0.9693557 0.9973967 0.9938367 0.9035623 0.941113
explore_brt (mod_file_path = brt_outputs[3 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.38628958
Residual.Deviance 0.06368071
Correlation 0.99128982
AUC 1.00000000
Per.Expl 95.40639170
cvDeviance 0.26342427
cvCorrelation 0.92717887
cvAUC 0.98558000
cvPer.Expl 80.99788972
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 52.3167942
tag 19.4852974
lat 8.2334957
temp_mean 3.9473184
AGI_250m 3.3725702
AGI_0m 2.2700357
bathy_mean 1.9914815
chl_mean 1.7837172
sal_mean 1.4620165
AGI_60m 1.1307988
ssh_mean 0.8257918
vostr_mean 0.5574685
bathy_sd 0.5546149
vo_mean 0.5191201
uostr_mean 0.4545932
uo_mean 0.4254801
mld_mean 0.4233323
pred_var 0.2460737
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 lat 1 tag 716.35
2 13 bathy_sd 1 tag 437.20
3 3 chl_mean 1 tag 402.35
4 5 sal_mean 1 tag 378.54
5 16 AGI_60m 1 tag 342.53
6 17 AGI_250m 1 tag 264.53
7 12 bathy_mean 1 tag 261.67
8 4 temp_mean 1 tag 240.35
9 15 dist_coast 1 tag 215.24
10 14 AGI_0m 1 tag 210.42
11 8 vo_mean 1 tag 200.95
12 9 vostr_mean 1 tag 193.50
13 14 AGI_0m 4 temp_mean 189.18
14 14 AGI_0m 2 lat 124.05
15 6 uo_mean 1 tag 115.00
16 10 ssh_mean 1 tag 113.62
[1] "External percent deviance explained"
[1] 0.9207309
[1] "TPR"
[1] 0.7758845
[1] "TSS"
[1] 0.9763362
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5700 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1110134 0.9755946 0.9980813 0.9942781 0.9207309 0.9540639
Summary table of results
output_sum <- read.csv (here ("data/brt/mod_outputs/brt_bckg_output_summary.csv" ))
kableExtra:: kable (output_sum)
base_0m_Nspat_Ntag
78.734
0.724
0.739
0.870
0.979
0.231
0.888
0.787
base_0m_Nspat_Ytag
92.976
0.876
0.761
0.961
0.994
0.141
0.960
0.930
base_0m_Yspat_Ytag
93.544
0.887
0.770
0.964
0.995
0.125
0.963
0.935
do_0m_Nspat_Ytag
94.201
0.901
0.772
0.971
0.996
0.124
0.969
0.942
do_0m_Yspat_Ytag
95.618
0.920
0.788
0.977
0.997
0.110
0.976
0.956
do_0m_60m_Nspat_Ytag
94.865
0.908
0.775
0.973
0.997
0.119
0.972
0.949
do_0m_250m_Nspat_Ytag
95.069
0.909
0.783
0.974
0.996
0.119
0.972
0.951
do_0m_60m_250m_Nspat_Ytag
95.132
0.913
0.783
0.976
0.997
0.116
0.973
0.951
do_0m_60m_250m_Yspat_Ytag
95.186
0.918
0.784
0.977
0.997
0.113
0.975
0.952
agi_0m_Nspat_Ytag
93.845
0.901
0.765
0.971
0.997
0.124
0.970
0.938
agi_0m_Yspat_Ytag
94.754
0.916
0.776
0.975
0.998
0.114
0.974
0.948
agi_0m_60m_Nspat_Ytag
94.548
0.908
0.765
0.973
0.997
0.119
0.972
0.945
agi_0m_250m_Nspat_Ytag
93.059
0.897
0.767
0.967
0.997
0.129
0.967
0.931
agi_0m_60m_250m_Nspat_Ytag
94.111
0.907
0.767
0.972
0.997
0.122
0.971
0.941
agi_0m_60m_250m_Yspat_Ytag
95.406
0.920
0.777
0.976
0.998
0.111
0.975
0.954
ggplot (output_sum, aes (AUC, TSS, color = deviance_exp, label = model)) +
geom_point (size = 5 ) +
xlab ('AUC' ) +
ylab ('TSS' ) +
scale_color_gradientn (colors = MetBrewer:: met.brewer ("Greek" )) +
ggrepel:: geom_label_repel (aes (label = model),
box.padding = 0.35 ,
point.padding = 0.5 ,
segment.color = 'grey50' ,
max.overlaps = 20 ,
label.size = 0.5 )
Conclusions from initial models w/ tag ID
Base models: Relative to the CRW PA base models, these had drastically higher AUC scores and deviance explained values. The base model with no spatial or tag ID predictors was the lowest scoring model.
DO and AGIModel performance generally increased with the added depth layers, but were all fairly comparable to each other. Models with spatial and tag ID predictors performed the best, but as described on the CRW PA document, we will likely not include them for these models as they would not be included in the projection work and are not essential for addressing this study’s objectives.
The performance metrics across comparable DO and AGI models were much more similar relative to the models with the CRW PA data.
DO models w/o tag ID
Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that dissolved oxygen may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.
0m, no spatial, no tag 0m, yes spatial, no tag 0m & 60m, no spatial, no tag 0m & 250m, no spatial, no tag 0m, 60m, & 250m, no spatial, no tag 0m, 60m, & 250m, yes spatial, no tag
explore_brt (mod_file_path = brt_outputs_Ntag[12 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862928
Residual.Deviance 0.2227822
Correlation 0.9477851
AUC 0.9963000
Per.Expl 83.9296423
cvDeviance 0.5119148
cvCorrelation 0.8357045
cvAUC 0.9593600
cvPer.Expl 63.0731081
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 37.042591
o2_mean_0m 29.610389
temp_mean 8.255550
chl_mean 5.168471
ssh_mean 3.874249
sal_mean 3.296404
vostr_mean 2.770317
mld_mean 2.274538
bathy_sd 2.115492
uostr_mean 1.764755
uo_mean 1.535677
vo_mean 1.263401
pred_var 1.028166
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 1203.43
2 2 chl_mean 1 o2_mean_0m 685.80
3 11 bathy_mean 3 temp_mean 629.37
4 11 bathy_mean 5 uo_mean 482.06
5 9 ssh_mean 3 temp_mean 428.24
6 10 mld_mean 7 vo_mean 397.02
7 11 bathy_mean 9 ssh_mean 393.82
8 11 bathy_mean 4 sal_mean 391.94
[1] "External percent deviance explained"
[1] 0.7880609
[1] "TPR"
[1] 0.7442983
[1] "TSS"
[1] 0.9121023
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4500 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1971633 0.9208741 0.9883068 1.000565 0.7880609 0.8392964
explore_brt (mod_file_path = brt_outputs_Ntag[13 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862928
Residual.Deviance 0.1924089
Correlation 0.9564733
AUC 0.9975000
Per.Expl 86.1206180
cvDeviance 0.4707743
cvCorrelation 0.8515532
cvAUC 0.9652100
cvPer.Expl 66.0407773
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 53.5317897
o2_mean_0m 12.2975738
lat 8.1708571
bathy_mean 5.8707423
chl_mean 3.6108876
temp_mean 3.2303458
sal_mean 2.5444041
ssh_mean 2.2134207
vostr_mean 1.7157868
mld_mean 1.5219427
uo_mean 1.1835532
bathy_sd 1.1832015
vo_mean 1.0970184
uostr_mean 0.9877975
pred_var 0.8406787
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 4 temp_mean 2 o2_mean_0m 890.78
2 12 bathy_mean 4 temp_mean 775.24
3 14 dist_coast 11 mld_mean 564.53
4 3 chl_mean 2 o2_mean_0m 249.12
5 2 o2_mean_0m 1 lat 209.95
6 12 bathy_mean 1 lat 192.27
7 12 bathy_mean 6 uo_mean 190.20
8 12 bathy_mean 5 sal_mean 178.54
9 7 uostr_mean 2 o2_mean_0m 145.22
10 12 bathy_mean 3 chl_mean 137.46
11 12 bathy_mean 10 ssh_mean 127.01
[1] "External percent deviance explained"
[1] 0.8092043
[1] "TPR"
[1] 0.745814
[1] "TSS"
[1] 0.9206658
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4450 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1864288 0.9293438 0.9903905 0.9983164 0.8092043 0.8612062
explore_brt (mod_file_path = brt_outputs_Ntag[11 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862928
Residual.Deviance 0.2018195
Correlation 0.9539294
AUC 0.9971000
Per.Expl 85.4417866
cvDeviance 0.4904242
cvCorrelation 0.8443609
cvAUC 0.9626400
cvPer.Expl 64.6233345
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 32.8872618
o2_mean_0m 29.0113534
o2_mean_60m 11.2428960
ssh_mean 4.9169342
chl_mean 4.7561458
temp_mean 4.0546910
sal_mean 2.9788618
vostr_mean 1.8897280
mld_mean 1.8536327
bathy_sd 1.6149277
uo_mean 1.4543292
uostr_mean 1.3424132
vo_mean 1.1227679
pred_var 0.8740571
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 811.01
2 6 uostr_mean 1 o2_mean_0m 445.93
3 2 chl_mean 1 o2_mean_0m 381.63
4 11 bathy_mean 4 sal_mean 370.20
5 10 mld_mean 7 vo_mean 347.32
6 11 bathy_mean 5 uo_mean 325.10
7 13 o2_mean_60m 3 temp_mean 284.60
8 9 ssh_mean 2 chl_mean 279.34
9 11 bathy_mean 3 temp_mean 279.32
10 11 bathy_mean 1 o2_mean_0m 238.68
[1] "External percent deviance explained"
[1] 0.8035579
[1] "TPR"
[1] 0.7452064
[1] "TSS"
[1] 0.918435
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4600 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.189613 0.9268764 0.9898308 1.000781 0.8035579 0.8544179
explore_brt (mod_file_path = brt_outputs_Ntag[8 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862928
Residual.Deviance 0.2217747
Correlation 0.9471965
AUC 0.9959000
Per.Expl 84.0023199
cvDeviance 0.4919018
cvCorrelation 0.8427542
cvAUC 0.9622200
cvPer.Expl 64.5167458
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 31.0545282
o2_mean_0m 30.5314940
o2_mean_250m 15.9557395
temp_mean 3.9670785
chl_mean 3.9145848
sal_mean 2.7179083
ssh_mean 2.6863495
bathy_sd 1.7546135
vostr_mean 1.5258534
mld_mean 1.4129971
uostr_mean 1.3536665
uo_mean 1.3228056
vo_mean 1.0371771
pred_var 0.7652041
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 733.46
2 13 o2_mean_250m 1 o2_mean_0m 339.34
3 2 chl_mean 1 o2_mean_0m 329.37
4 13 o2_mean_250m 4 sal_mean 297.21
5 6 uostr_mean 5 uo_mean 278.72
6 11 bathy_mean 3 temp_mean 265.17
7 11 bathy_mean 1 o2_mean_0m 261.90
8 11 bathy_mean 4 sal_mean 224.97
9 9 ssh_mean 4 sal_mean 194.26
10 4 sal_mean 1 o2_mean_0m 187.63
[1] "External percent deviance explained"
[1] 0.7906954
[1] "TPR"
[1] 0.744449
[1] "TSS"
[1] 0.9119212
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4100 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1971733 0.9205675 0.9884804 0.9982947 0.7906954 0.8400232
explore_brt (mod_file_path = brt_outputs_Ntag[9 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862928
Residual.Deviance 0.1928786
Correlation 0.9567911
AUC 0.9976000
Per.Expl 86.0867365
cvDeviance 0.4815743
cvCorrelation 0.8475797
cvAUC 0.9638500
cvPer.Expl 65.2617211
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 30.1719892
bathy_mean 29.0380795
o2_mean_250m 12.9073690
o2_mean_60m 7.1186837
chl_mean 3.2788350
temp_mean 3.2112162
sal_mean 2.8197533
ssh_mean 2.5431217
bathy_sd 1.6008272
vostr_mean 1.4381590
mld_mean 1.4332633
uostr_mean 1.3219324
uo_mean 1.2491223
vo_mean 1.0814297
pred_var 0.7862186
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 632.37
2 2 chl_mean 1 o2_mean_0m 631.48
3 13 o2_mean_60m 3 temp_mean 326.47
4 11 bathy_mean 5 uo_mean 292.91
5 14 o2_mean_250m 1 o2_mean_0m 266.03
6 11 bathy_mean 3 temp_mean 212.56
7 11 bathy_mean 4 sal_mean 183.05
8 6 uostr_mean 5 uo_mean 176.12
9 14 o2_mean_250m 11 bathy_mean 166.45
10 4 sal_mean 1 o2_mean_0m 153.66
11 9 ssh_mean 4 sal_mean 153.50
[1] "External percent deviance explained"
[1] 0.8106455
[1] "TPR"
[1] 0.7457058
[1] "TSS"
[1] 0.922993
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1851259 0.9305293 0.9905704 0.9997649 0.8106455 0.8608674
explore_brt (mod_file_path = brt_outputs_Ntag[10 ],
test_data = do_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862928
Residual.Deviance 0.1730551
Correlation 0.9630163
AUC 0.9983000
Per.Expl 87.5167023
cvDeviance 0.4571302
cvCorrelation 0.8567712
cvAUC 0.9671700
cvPer.Expl 67.0249860
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 51.0947600
o2_mean_0m 11.9843937
o2_mean_250m 7.5420896
lat 5.0225211
bathy_mean 3.7536001
o2_mean_60m 3.5987315
chl_mean 2.9175112
temp_mean 2.7347070
sal_mean 2.4927940
ssh_mean 1.8254724
mld_mean 1.3819650
vostr_mean 1.1186016
uo_mean 1.0635226
uostr_mean 0.9547572
bathy_sd 0.9449802
vo_mean 0.9346186
pred_var 0.6349742
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 4 temp_mean 2 o2_mean_0m 1104.53
2 12 bathy_mean 4 temp_mean 566.77
3 14 dist_coast 11 mld_mean 551.57
4 12 bathy_mean 1 lat 485.06
5 12 bathy_mean 5 sal_mean 373.98
6 5 sal_mean 2 o2_mean_0m 254.84
7 12 bathy_mean 3 chl_mean 244.77
8 16 o2_mean_250m 12 bathy_mean 192.70
9 15 o2_mean_60m 4 temp_mean 184.99
10 16 o2_mean_250m 2 o2_mean_0m 154.32
11 16 o2_mean_250m 1 lat 133.79
12 2 o2_mean_0m 1 lat 130.57
13 15 o2_mean_60m 3 chl_mean 117.28
14 3 chl_mean 2 o2_mean_0m 114.22
[1] "External percent deviance explained"
[1] 0.8235084
[1] "TPR"
[1] 0.7466788
[1] "TSS"
[1] 0.9315503
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1776279 0.9362389 0.9917015 0.9991013 0.8235084 0.875167
AGI models w/o tag ID
Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that AGI may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.
0m, no spatial, no tag 0m, yes spatial, no tag 0m & 60m, no spatial, no tag 0m & 250m, no spatial, no tag 0m, 60m, & 250m, no spatial, no tag 0m, 60m, & 250m, yes spatial, no tag
explore_brt (mod_file_path = brt_outputs_Ntag[5 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862896
Residual.Deviance 0.2497858
Correlation 0.9395089
AUC 0.9945000
Per.Expl 81.9816994
cvDeviance 0.5307930
cvCorrelation 0.8283713
cvAUC 0.9563800
cvPer.Expl 61.7112444
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 36.9573034
temp_mean 21.9226886
AGI_0m 9.9456826
ssh_mean 5.1190726
uostr_mean 5.0894934
sal_mean 4.6868399
chl_mean 4.6532916
vostr_mean 3.5292848
bathy_sd 2.1704713
uo_mean 1.7266251
mld_mean 1.7123633
vo_mean 1.5145409
pred_var 0.9723426
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 6924.06
2 10 bathy_mean 4 uo_mean 486.67
3 12 AGI_0m 8 ssh_mean 471.86
4 10 bathy_mean 8 ssh_mean 421.48
5 12 AGI_0m 4 uo_mean 404.95
6 10 bathy_mean 2 temp_mean 375.23
7 10 bathy_mean 3 sal_mean 341.20
8 8 ssh_mean 5 uostr_mean 230.04
[1] "External percent deviance explained"
[1] 0.7650036
[1] "TPR"
[1] 0.7425692
[1] "TSS"
[1] 0.8966394
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4100 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2098667 0.9095963 0.985447 0.993921 0.7650036 0.819817
explore_brt (mod_file_path = brt_outputs_Ntag[6 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862896
Residual.Deviance 0.1967422
Correlation 0.9556098
AUC 0.9974000
Per.Expl 85.8080043
cvDeviance 0.4860996
cvCorrelation 0.8459321
cvAUC 0.9628400
cvPer.Expl 64.9352059
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 53.5244509
lat 10.5089650
AGI_0m 7.1058947
bathy_mean 6.5961745
temp_mean 5.3593100
chl_mean 3.3173597
sal_mean 2.8581691
ssh_mean 2.0268410
uo_mean 1.5245115
vostr_mean 1.4444895
mld_mean 1.3388584
bathy_sd 1.2377161
uostr_mean 1.2096878
vo_mean 1.1623617
pred_var 0.7852103
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 2714.25
2 13 AGI_0m 1 lat 626.22
3 11 bathy_mean 3 temp_mean 463.19
4 11 bathy_mean 2 chl_mean 314.46
5 3 temp_mean 1 lat 308.75
6 11 bathy_mean 5 uo_mean 282.97
7 13 AGI_0m 11 bathy_mean 248.68
8 11 bathy_mean 1 lat 234.37
9 14 dist_coast 8 vostr_mean 177.05
10 11 bathy_mean 9 ssh_mean 176.04
11 11 bathy_mean 4 sal_mean 173.67
[1] "External percent deviance explained"
[1] 0.8051854
[1] "TPR"
[1] 0.7454889
[1] "TSS"
[1] 0.9191731
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1888351 0.9274899 0.9901186 0.9976121 0.8051854 0.85808
explore_brt (mod_file_path = brt_outputs_Ntag[4 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862896
Residual.Deviance 0.1981884
Correlation 0.9571046
AUC 0.9975000
Per.Expl 85.7036795
cvDeviance 0.5045864
cvCorrelation 0.8398032
cvAUC 0.9604600
cvPer.Expl 63.6016628
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 34.034848
temp_mean 21.042165
AGI_0m 9.921024
AGI_60m 5.785260
sal_mean 5.106237
uostr_mean 4.656571
chl_mean 4.204098
ssh_mean 4.041292
vostr_mean 3.376607
bathy_sd 1.942936
uo_mean 1.777164
mld_mean 1.692922
vo_mean 1.399912
pred_var 1.018965
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 6728.90
2 10 bathy_mean 2 temp_mean 594.17
3 12 AGI_0m 8 ssh_mean 365.47
4 10 bathy_mean 3 sal_mean 336.23
5 10 bathy_mean 8 ssh_mean 334.83
6 13 AGI_60m 10 bathy_mean 328.60
7 10 bathy_mean 4 uo_mean 286.71
8 12 AGI_0m 4 uo_mean 200.22
9 13 AGI_60m 2 temp_mean 196.14
10 5 uostr_mean 2 temp_mean 190.43
[1] "External percent deviance explained"
[1] 0.7987083
[1] "TPR"
[1] 0.7448015
[1] "TSS"
[1] 0.9123278
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5100 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1917355 0.9252577 0.9893765 0.993329 0.7987083 0.8570368
explore_brt (mod_file_path = brt_outputs_Ntag[1 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862896
Residual.Deviance 0.2223382
Correlation 0.9477111
AUC 0.9960000
Per.Expl 83.9616359
cvDeviance 0.5036610
cvCorrelation 0.8404252
cvAUC 0.9600700
cvPer.Expl 63.6684121
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 31.079609
temp_mean 20.440342
AGI_250m 12.927116
AGI_0m 8.557389
uostr_mean 5.276555
ssh_mean 4.592386
sal_mean 4.485481
chl_mean 3.402499
bathy_sd 2.123828
vostr_mean 2.027799
uo_mean 1.572834
vo_mean 1.438505
mld_mean 1.305188
pred_var 0.770471
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 4454.06
2 10 bathy_mean 3 sal_mean 445.32
3 13 AGI_250m 2 temp_mean 412.52
4 13 AGI_250m 3 sal_mean 286.51
5 13 AGI_250m 10 bathy_mean 285.61
6 12 AGI_0m 10 bathy_mean 283.01
7 10 bathy_mean 2 temp_mean 274.09
8 12 AGI_0m 4 uo_mean 267.22
9 12 AGI_0m 8 ssh_mean 234.66
10 12 AGI_0m 11 bathy_sd 234.29
[1] "External percent deviance explained"
[1] 0.7846544
[1] "TPR"
[1] 0.7438603
[1] "TSS"
[1] 0.9071105
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4300 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1991261 0.9190031 0.9874679 0.9947123 0.7846544 0.8396164
explore_brt (mod_file_path = brt_outputs_Ntag[2 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862896
Residual.Deviance 0.1822323
Correlation 0.9612794
AUC 0.9981000
Per.Expl 86.8546720
cvDeviance 0.4900686
cvCorrelation 0.8457470
cvAUC 0.9625500
cvPer.Expl 64.6489000
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 29.5789429
temp_mean 20.2111119
AGI_250m 12.8943905
AGI_0m 8.2276644
uostr_mean 5.6018303
sal_mean 4.1762023
ssh_mean 3.7524824
AGI_60m 3.4167547
chl_mean 3.2497693
vostr_mean 1.9240122
bathy_sd 1.8515668
uo_mean 1.5600765
mld_mean 1.3810041
vo_mean 1.3729043
pred_var 0.8012876
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 5794.97
2 10 bathy_mean 3 sal_mean 437.23
3 14 AGI_250m 2 temp_mean 429.14
4 12 AGI_0m 10 bathy_mean 421.21
5 13 AGI_60m 10 bathy_mean 414.15
6 12 AGI_0m 8 ssh_mean 331.82
7 4 uo_mean 2 temp_mean 322.86
8 10 bathy_mean 4 uo_mean 294.72
9 10 bathy_mean 2 temp_mean 294.26
10 14 AGI_250m 3 sal_mean 239.00
11 8 ssh_mean 3 sal_mean 167.69
[1] "External percent deviance explained"
[1] 0.8090627
[1] "TPR"
[1] 0.7453477
[1] "TSS"
[1] 0.9220812
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5150 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1852343 0.9303529 0.9900014 0.9932172 0.8090627 0.8685467
explore_brt (mod_file_path = brt_outputs_Ntag[3 ],
test_data = agi_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862896
Residual.Deviance 0.1859167
Correlation 0.9590992
AUC 0.9978000
Per.Expl 86.5888987
cvDeviance 0.4697185
cvCorrelation 0.8519017
cvAUC 0.9651800
cvPer.Expl 66.1168520
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 51.8697060
lat 9.6962485
AGI_0m 6.1261413
bathy_mean 5.4406471
AGI_250m 4.8968222
temp_mean 4.8373778
chl_mean 2.9005605
sal_mean 2.7254972
AGI_60m 2.2333278
ssh_mean 1.9217436
uo_mean 1.3083809
mld_mean 1.2604230
vostr_mean 1.2375953
vo_mean 1.0230573
uostr_mean 0.9963094
bathy_sd 0.9226174
pred_var 0.6035448
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 1149.59
2 13 AGI_0m 1 lat 966.63
3 13 AGI_0m 11 bathy_mean 645.51
4 15 AGI_60m 11 bathy_mean 335.90
5 11 bathy_mean 3 temp_mean 275.79
6 11 bathy_mean 5 uo_mean 265.44
7 6 uostr_mean 1 lat 221.40
8 16 AGI_250m 11 bathy_mean 194.80
9 13 AGI_0m 5 uo_mean 193.30
10 11 bathy_mean 1 lat 183.75
11 3 temp_mean 1 lat 176.25
12 15 AGI_60m 3 temp_mean 165.89
13 11 bathy_mean 2 chl_mean 152.85
14 8 vostr_mean 5 uo_mean 137.36
[1] "External percent deviance explained"
[1] 0.8114904
[1] "TPR"
[1] 0.7455845
[1] "TSS"
[1] 0.9264432
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.184201 0.9311473 0.9903269 0.9962682 0.8114904 0.865889
Summary table of results
output_sum_Ntag <- read.csv (here ("data/brt/mod_outputs/brt_bckg_output_summary_Ntag.csv" ))
kableExtra:: kable (output_sum_Ntag)
base_0m_Nspat_Ntag
78.734
0.724
0.739
0.870
0.979
0.231
0.888
0.787
do_0m_Nspat_Ntag
83.930
0.785
0.744
0.906
0.987
0.199
0.919
0.839
do_0m_Yspat_Ntag
86.121
0.810
0.746
0.921
0.990
0.186
0.930
0.861
do_0m_60m_Nspat_Ntag
85.442
0.802
0.745
0.919
0.989
0.189
0.927
0.854
do_0m_250m_Nspat_Ntag
84.002
0.789
0.744
0.910
0.987
0.197
0.920
0.840
do_0m_60m_250m_Nspat_Ntag
86.087
0.809
0.746
0.917
0.990
0.187
0.929
0.861
do_0m_60m_250m_Yspat_Ntag
87.517
0.823
0.747
0.928
0.992
0.179
0.935
0.875
agi_0m_Nspat_Ntag
81.982
0.775
0.743
0.903
0.987
0.204
0.915
0.820
agi_0m_Yspat_Ntag
85.808
0.809
0.746
0.922
0.990
0.186
0.930
0.858
agi_0m_60m_Nspat_Ntag
85.704
0.805
0.745
0.922
0.990
0.187
0.929
0.857
agi_0m_250m_Nspat_Ntag
83.962
0.793
0.744
0.914
0.988
0.195
0.923
0.840
agi_0m_60m_250m_Nspat_Ntag
86.855
0.818
0.746
0.928
0.991
0.179
0.935
0.869
agi_0m_60m_250m_Yspat_Ntag
86.589
0.820
0.746
0.928
0.991
0.180
0.935
0.866
output_sum_Ntag_Nspat <- output_sum_Ntag %>%
filter (! grepl ("Yspat" , model))
ggplot (output_sum_Ntag_Nspat, aes (AUC, TSS, color = deviance_exp, label = model)) +
geom_point (size = 5 ) +
xlab ('AUC' ) +
ylab ('TSS' ) +
scale_color_gradientn (colors = MetBrewer:: met.brewer ("Greek" )) +
ggrepel:: geom_label_repel (aes (label = model),
box.padding = 0.35 ,
point.padding = 0.5 ,
segment.color = 'grey50' ,
max.overlaps = 20 ,
label.size = 0.5 )
Conclusions from initial models w/o tag ID
If only considering models that did not include spatial data as model predictors, the AGI models performed much better than the DO models across the board.
The AGI model will all depth layers performed the best and considerably better than the comparable DO model.
For the DO model with all depth layers, DO_0m was the predictor variable with the highest relative influence, but was closely followed by bathymetry. DO_250m was the third most influential predictor, but is considerably lower than DO_0m and bathymetry. Partial plots show drastically different relationships that the CRW PA models, with DO_250m having a positive correlation and DO_0m having an inverse sweet spot.
For the AGI model with all depth layers, bathymetry and temperature were the two predictors with the highest relative influence, and AGI 250m was listed third, somewhat closely followed by AGI 0m. The partial plots for these two variables are similar to the DO models, but less extreme.
Base models w/o tag ID and w/ data at seasonal and annual resolutions
For these models, the environmental raster data was averaged according to season and year. Observed and pseudo absence locations were then used for environmental data extraction along these raster files and were matched to each file according to either the season or year.
explore_brt (mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_base_0m_seas_Nspat_Ntag.rds" ,
test_data = base_test_seasonal)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862930
Residual.Deviance 0.3718825
Correlation 0.8913160
AUC 0.9811000
Per.Expl 73.1743220
cvDeviance 0.5439165
cvCorrelation 0.8203274
cvAUC 0.9543000
cvPer.Expl 60.7646778
[1] "Relative influence of predictor variables"
rel.inf
vo_mean 37.7484397
vostr_mean 13.0995207
bathy_mean 9.5110530
uostr_mean 8.6973249
ssh_mean 8.4917915
sal_mean 6.2551008
temp_mean 5.2891595
mld_mean 3.9670224
chl_mean 2.9443099
uo_mean 1.9103745
bathy_sd 1.2421311
pred_var 0.8437721
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 2 sal_mean 1 mld_mean 1130.27
2 10 bathy_mean 6 uostr_mean 473.01
3 8 vostr_mean 4 temp_mean 345.16
4 7 vo_mean 3 ssh_mean 238.10
5 7 vo_mean 2 sal_mean 188.96
6 8 vostr_mean 3 ssh_mean 179.53
7 4 temp_mean 2 sal_mean 164.83
[1] "External percent deviance explained"
[1] 0.7074127
[1] "TPR"
[1] 0.7379161
[1] "TSS"
[1] 0.8507636
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4450 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2415391 0.8768613 0.9764832 1.000477 0.7074127 0.7317432
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_base_0m_ann_Nspat_Ntag.rds" ,
test_data = base_test_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862892
Residual.Deviance 0.3485522
Correlation 0.9016247
AUC 0.9844000
Per.Expl 74.8571794
cvDeviance 0.5423354
cvCorrelation 0.8223235
cvAUC 0.9539500
cvPer.Expl 60.8786270
[1] "Relative influence of predictor variables"
rel.inf
vo_mean 38.5581912
vostr_mean 16.9760400
uostr_mean 11.7424763
bathy_mean 10.1812331
chl_mean 5.1572250
sal_mean 4.4084308
temp_mean 3.6019413
ssh_mean 3.1522272
mld_mean 2.3610565
uo_mean 1.8300039
bathy_sd 1.2618894
pred_var 0.7692852
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 vostr_mean 6 uostr_mean 1088.69
2 7 vo_mean 6 uostr_mean 501.60
3 10 bathy_mean 8 vostr_mean 396.66
4 3 ssh_mean 1 mld_mean 391.80
5 8 vostr_mean 3 ssh_mean 319.72
6 8 vostr_mean 4 temp_mean 298.87
7 8 vostr_mean 1 mld_mean 259.45
[1] "External percent deviance explained"
[1] 0.719988
[1] "TPR"
[1] 0.7389636
[1] "TSS"
[1] 0.8595799
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5200 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2344847 0.8846899 0.9784438 0.9965839 0.719988 0.7485718
DO models w/o tag ID and w/ data at seasonal and annual resolutions
Seasonal, Nspat, Ntag Seasonal, Yspat, Ntag Annual, Nspat, Ntag Annual, Yspat, Ntag Daily, seasonal, and Annual, Nspat, Ntag Daily, Seasonal, and Annual, Yspat, Ntag
explore_brt (mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_do_0m_60m_250m_seas_Nspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.2470047
Correlation 0.9382402
AUC 0.9942000
Per.Expl 82.1822454
cvDeviance 0.4886605
cvCorrelation 0.8432439
cvAUC 0.9622300
cvPer.Expl 64.7503346
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m_seas 29.8253334
bathy_mean 28.6073228
o2_mean_250m_seas 12.8824390
o2_mean_60m_seas 7.4286303
ssh_mean 3.7142049
chl_mean 3.4442744
temp_mean 3.3428766
sal_mean 2.5604620
uostr_mean 1.4038380
mld_mean 1.3638226
bathy_sd 1.3610024
vostr_mean 1.2779076
uo_mean 1.2051556
vo_mean 0.8640793
pred_var 0.7186511
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 14 o2_mean_60m_seas 3 sal_mean 327.14
2 15 o2_mean_250m_seas 10 bathy_mean 303.66
3 10 bathy_mean 1 chl_mean 255.27
4 10 bathy_mean 4 uo_mean 243.69
5 10 bathy_mean 8 ssh_mean 225.29
6 15 o2_mean_250m_seas 13 o2_mean_0m_seas 198.72
7 10 bathy_mean 2 temp_mean 198.07
8 13 o2_mean_0m_seas 8 ssh_mean 191.82
9 13 o2_mean_0m_seas 2 temp_mean 185.50
10 10 bathy_mean 3 sal_mean 165.98
11 13 o2_mean_0m_seas 10 bathy_mean 164.29
[1] "External percent deviance explained"
[1] 0.7809057
[1] "TPR"
[1] 0.7438538
[1] "TSS"
[1] 0.9010958
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6850 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2030244 0.9156525 0.9876733 0.9957416 0.7809057 0.8218225
explore_brt (mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_do_0m_60m_250m_seas_Yspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.2393353
Correlation 0.9405329
AUC 0.9946000
Per.Expl 82.7354798
cvDeviance 0.4837428
cvCorrelation 0.8448644
cvAUC 0.9630300
cvPer.Expl 65.1050721
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 51.7884390
o2_mean_0m_seas 11.3114005
o2_mean_250m_seas 8.0490924
o2_mean_60m_seas 4.3433808
bathy_mean 4.0318500
lat 3.9720294
chl_mean 3.0187321
temp_mean 2.7062765
sal_mean 2.2069866
ssh_mean 1.8282680
mld_mean 1.2179747
uo_mean 1.0978100
vostr_mean 1.0915666
bathy_sd 1.0099016
uostr_mean 0.9010789
vo_mean 0.8156201
pred_var 0.6095928
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 394.73
2 17 o2_mean_250m_seas 11 bathy_mean 298.69
3 11 bathy_mean 2 chl_mean 293.55
4 11 bathy_mean 5 uo_mean 198.66
5 16 o2_mean_60m_seas 4 sal_mean 174.92
6 16 o2_mean_60m_seas 11 bathy_mean 171.41
7 15 o2_mean_0m_seas 1 lat 160.69
8 11 bathy_mean 9 ssh_mean 156.06
9 13 dist_coast 10 mld_mean 151.33
10 11 bathy_mean 4 sal_mean 125.29
11 6 uostr_mean 1 lat 112.97
12 15 o2_mean_0m_seas 4 sal_mean 111.65
13 16 o2_mean_60m_seas 3 temp_mean 100.81
14 9 ssh_mean 4 sal_mean 99.29
[1] "External percent deviance explained"
[1] 0.7858132
[1] "TPR"
[1] 0.7442682
[1] "TSS"
[1] 0.9024247
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6800 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2008101 0.9175289 0.9882875 0.9945688 0.7858132 0.8273548
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_ann_Nspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.2418740
Correlation 0.9426300
AUC 0.9953000
Per.Expl 82.5523491
cvDeviance 0.5203305
cvCorrelation 0.8308027
cvAUC 0.9580000
cvPer.Expl 62.4658114
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 27.6559956
o2_mean_0m_ann 22.1577042
o2_mean_250m_ann 13.9331703
temp_mean 7.5838024
o2_mean_60m_ann 7.1954997
chl_mean 4.4958699
sal_mean 3.5287622
ssh_mean 2.9167102
uostr_mean 2.2307148
vostr_mean 1.6575902
mld_mean 1.5932019
bathy_sd 1.5853359
uo_mean 1.4712842
vo_mean 1.1971742
pred_var 0.7971843
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 14 o2_mean_60m_ann 2 temp_mean 476.01
2 14 o2_mean_60m_ann 10 bathy_mean 249.69
3 10 bathy_mean 2 temp_mean 246.40
4 10 bathy_mean 1 chl_mean 230.23
5 10 bathy_mean 4 uo_mean 223.84
6 10 bathy_mean 3 sal_mean 190.35
7 8 ssh_mean 5 uostr_mean 157.00
8 10 bathy_mean 8 ssh_mean 137.07
9 14 o2_mean_60m_ann 13 o2_mean_0m_ann 133.08
10 13 o2_mean_0m_ann 3 sal_mean 125.77
11 15 o2_mean_250m_ann 13 o2_mean_0m_ann 123.86
[1] "External percent deviance explained"
[1] 0.7776352
[1] "TPR"
[1] 0.7436879
[1] "TSS"
[1] 0.9021378
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8200 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2034744 0.9156027 0.9876415 0.9972787 0.7776352 0.8255235
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_ann_Yspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.2251524
Correlation 0.9476293
AUC 0.9962000
Per.Expl 83.7585691
cvDeviance 0.5016073
cvCorrelation 0.8374576
cvAUC 0.9609200
cvPer.Expl 63.8164123
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 52.0391515
lat 7.5722313
o2_mean_250m_ann 5.9824841
o2_mean_0m_ann 5.6170844
bathy_mean 5.3985391
chl_mean 4.1449974
temp_mean 3.4237513
o2_mean_60m_ann 3.1501892
sal_mean 2.8294737
ssh_mean 2.2021890
vostr_mean 1.4298185
mld_mean 1.3292879
uo_mean 1.2080682
bathy_sd 1.0629375
uostr_mean 1.0065255
vo_mean 0.9086555
pred_var 0.6946158
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 552.75
2 16 o2_mean_60m_ann 11 bathy_mean 437.44
3 11 bathy_mean 2 chl_mean 329.67
4 16 o2_mean_60m_ann 3 temp_mean 246.01
5 6 uostr_mean 1 lat 204.09
6 13 dist_coast 10 mld_mean 180.83
7 15 o2_mean_0m_ann 1 lat 172.78
8 11 bathy_mean 9 ssh_mean 157.39
9 11 bathy_mean 5 uo_mean 129.80
10 16 o2_mean_60m_ann 4 sal_mean 121.32
11 8 vostr_mean 5 uo_mean 119.76
12 16 o2_mean_60m_ann 1 lat 111.11
13 17 o2_mean_250m_ann 1 lat 97.43
14 3 temp_mean 1 lat 84.62
[1] "External percent deviance explained"
[1] 0.7932303
[1] "TPR"
[1] 0.7449284
[1] "TSS"
[1] 0.9062309
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8400 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1955452 0.9224583 0.9896757 0.9959646 0.7932303 0.8375857
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.1951649
Correlation 0.9560043
AUC 0.9974000
Per.Expl 85.9217263
cvDeviance 0.4585485
cvCorrelation 0.8553534
cvAUC 0.9665500
cvPer.Expl 66.9224706
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 26.7948276
o2_mean_0m 21.5078589
o2_mean_250m_seas 10.0195711
o2_mean_0m_seas 8.3152092
o2_mean_60m_seas 5.4522658
o2_mean_250m_ann 3.7071018
o2_mean_0m_ann 3.4441561
chl_mean 2.6938371
temp_mean 2.5287651
o2_mean_60m_ann 1.9918635
sal_mean 1.9409870
ssh_mean 1.9325752
o2_mean_250m 1.8523367
o2_mean_60m 1.6689402
vostr_mean 1.0311990
bathy_sd 1.0131874
mld_mean 1.0037532
uostr_mean 0.9663831
uo_mean 0.9203555
vo_mean 0.6778727
pred_var 0.5369536
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 370.38
2 18 o2_mean_250m_seas 14 o2_mean_250m 306.19
3 18 o2_mean_250m_seas 11 bathy_mean 305.67
4 13 o2_mean_60m 11 bathy_mean 224.97
5 17 o2_mean_60m_seas 4 sal_mean 204.17
6 11 bathy_mean 2 chl_mean 135.89
7 20 o2_mean_60m_ann 11 bathy_mean 134.81
8 11 bathy_mean 3 temp_mean 133.33
9 11 bathy_mean 5 uo_mean 129.99
10 2 chl_mean 1 o2_mean_0m 114.45
11 11 bathy_mean 4 sal_mean 102.36
12 16 o2_mean_0m_seas 3 temp_mean 96.45
13 16 o2_mean_0m_seas 9 ssh_mean 94.23
14 20 o2_mean_60m_ann 7 vo_mean 90.49
15 11 bathy_mean 9 ssh_mean 87.48
16 4 sal_mean 3 temp_mean 83.53
17 8 vostr_mean 1 o2_mean_0m 81.07
18 21 o2_mean_250m_ann 18 o2_mean_250m_seas 80.52
19 18 o2_mean_250m_seas 1 o2_mean_0m 72.67
20 6 uostr_mean 1 o2_mean_0m 72.01
21 19 o2_mean_0m_ann 17 o2_mean_60m_seas 67.89
22 13 o2_mean_60m 3 temp_mean 66.18
[1] "External percent deviance explained"
[1] 0.8162526
[1] "TPR"
[1] 0.7459763
[1] "TSS"
[1] 0.9263903
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7700 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1816337 0.933411 0.9914571 0.9954305 0.8162526 0.8592173
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862840
Residual.Deviance 0.1892159
Correlation 0.9579015
AUC 0.9977000
Per.Expl 86.3508545
cvDeviance 0.4502540
cvCorrelation 0.8578266
cvAUC 0.9676900
cvPer.Expl 67.5207993
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 49.9204334
o2_mean_0m 8.5367913
o2_mean_0m_seas 5.4191441
o2_mean_250m_seas 4.6052682
lat 3.6613526
bathy_mean 3.5322270
o2_mean_60m_seas 3.1409177
chl_mean 2.4859345
o2_mean_250m_ann 2.4025619
temp_mean 2.2239870
sal_mean 1.7501465
o2_mean_60m 1.6449577
o2_mean_250m 1.5228690
o2_mean_60m_ann 1.5065172
ssh_mean 1.4200972
o2_mean_0m_ann 0.9673241
vostr_mean 0.9644953
mld_mean 0.9434420
uo_mean 0.8649698
bathy_sd 0.7097241
uostr_mean 0.6677465
vo_mean 0.6428025
pred_var 0.4662903
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 20 o2_mean_250m_seas 12 bathy_mean 350.89
2 12 bathy_mean 3 chl_mean 339.39
3 12 bathy_mean 4 temp_mean 252.74
4 20 o2_mean_250m_seas 16 o2_mean_250m 233.93
5 12 bathy_mean 5 sal_mean 231.91
6 14 dist_coast 9 vostr_mean 230.46
7 22 o2_mean_60m_ann 12 bathy_mean 220.69
8 4 temp_mean 2 o2_mean_0m 210.49
9 14 dist_coast 11 mld_mean 185.40
10 15 o2_mean_60m 12 bathy_mean 177.56
11 3 chl_mean 2 o2_mean_0m 147.74
12 12 bathy_mean 6 uo_mean 144.14
13 12 bathy_mean 10 ssh_mean 142.95
14 9 vostr_mean 2 o2_mean_0m 136.67
15 15 o2_mean_60m 4 temp_mean 103.32
16 21 o2_mean_0m_ann 1 lat 99.30
17 19 o2_mean_60m_seas 5 sal_mean 97.76
18 18 o2_mean_0m_seas 13 bathy_sd 89.75
19 2 o2_mean_0m 1 lat 78.99
20 18 o2_mean_0m_seas 1 lat 77.18
21 22 o2_mean_60m_ann 2 o2_mean_0m 76.72
22 22 o2_mean_60m_ann 1 lat 68.39
23 18 o2_mean_0m_seas 10 ssh_mean 61.97
24 23 o2_mean_250m_ann 20 o2_mean_250m_seas 60.12
25 18 o2_mean_0m_seas 8 vo_mean 56.72
26 21 o2_mean_0m_ann 19 o2_mean_60m_seas 56.55
[1] "External percent deviance explained"
[1] 0.8206347
[1] "TPR"
[1] 0.7463549
[1] "TSS"
[1] 0.9268552
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7700 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1791644 0.9353569 0.9920654 0.9950274 0.8206347 0.8635085
AGI models w/o tag ID and w/ data at seasonal and annual resolutions
Seasonal, Nspat, Ntag Seasonal, Yspat, Ntag Annual, Nspat, Ntag Annual, Yspat, Ntag Daily, seasonal, and Annual, Nspat, Ntag Daily, Seasonal, and Annual, Yspat, Ntag
explore_brt (mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_agi_0m_60m_250m_seas_Nspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.2421421
Correlation 0.9418055
AUC 0.9951000
Per.Expl 82.5329751
cvDeviance 0.5074723
cvCorrelation 0.8361282
cvAUC 0.9592900
cvPer.Expl 63.3932574
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 28.8388522
temp_mean 21.3968548
AGI_250m_seas 15.4230725
uostr_mean 5.4030524
AGI_0m_seas 5.3820944
sal_mean 4.6015206
AGI_60m_seas 4.2373703
chl_mean 3.4946950
ssh_mean 2.7499598
mld_mean 1.7088384
vostr_mean 1.7086736
bathy_sd 1.6872755
uo_mean 1.3773158
vo_mean 1.2699474
pred_var 0.7204775
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 10 bathy_mean 3 sal_mean 513.24
2 15 AGI_250m_seas 2 temp_mean 439.74
3 10 bathy_mean 2 temp_mean 305.07
4 15 AGI_250m_seas 10 bathy_mean 256.15
5 14 AGI_60m_seas 10 bathy_mean 207.41
6 13 AGI_0m_seas 2 temp_mean 201.25
7 13 AGI_0m_seas 6 vo_mean 184.66
8 14 AGI_60m_seas 2 temp_mean 181.04
9 10 bathy_mean 4 uo_mean 143.19
10 2 temp_mean 1 chl_mean 132.10
11 6 vo_mean 3 sal_mean 131.10
[1] "External percent deviance explained"
[1] 0.7788118
[1] "TPR"
[1] 0.7436568
[1] "TSS"
[1] 0.897761
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7950 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2030187 0.9158413 0.9873706 0.9947294 0.7788118 0.8253298
explore_brt (mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_agi_0m_60m_250m_seas_Yspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.2155743
Correlation 0.9500420
AUC 0.9967000
Per.Expl 84.4494519
cvDeviance 0.4841299
cvCorrelation 0.8444286
cvAUC 0.9628100
cvPer.Expl 65.0770781
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 52.3084109
lat 7.5685093
AGI_250m_seas 6.5650645
bathy_mean 4.9318242
temp_mean 4.7807364
AGI_0m_seas 4.5184742
AGI_60m_seas 3.7734783
sal_mean 3.4432083
chl_mean 3.2167249
ssh_mean 1.8300516
mld_mean 1.4287051
vostr_mean 1.1808924
uo_mean 1.1091414
vo_mean 0.9476019
uostr_mean 0.9377008
bathy_sd 0.8703324
pred_var 0.5891434
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 11 bathy_mean 3 temp_mean 467.69
2 15 AGI_0m_seas 7 vo_mean 276.78
3 17 AGI_250m_seas 11 bathy_mean 271.41
4 3 temp_mean 1 lat 255.78
5 13 dist_coast 10 mld_mean 230.24
6 15 AGI_0m_seas 4 sal_mean 207.85
7 4 sal_mean 1 lat 151.54
8 11 bathy_mean 2 chl_mean 151.13
9 6 uostr_mean 1 lat 142.70
10 3 temp_mean 2 chl_mean 141.51
11 11 bathy_mean 9 ssh_mean 131.67
12 13 dist_coast 8 vostr_mean 107.96
13 16 AGI_60m_seas 11 bathy_mean 106.29
14 13 dist_coast 1 lat 100.32
[1] "External percent deviance explained"
[1] 0.7985461
[1] "TPR"
[1] 0.7449605
[1] "TSS"
[1] 0.9133059
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8300 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1919277 0.9252447 0.9896804 0.9950189 0.7985461 0.8444945
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_ann_Nspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.2628935
Correlation 0.9354953
AUC 0.9940000
Per.Expl 81.0360614
cvDeviance 0.5359687
cvCorrelation 0.8240669
cvAUC 0.9553400
cvPer.Expl 61.3376613
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 29.262030
temp_mean 21.315013
AGI_250m_ann 14.991258
uostr_mean 5.778944
sal_mean 5.124400
AGI_60m_ann 4.443709
chl_mean 3.980368
ssh_mean 3.370811
AGI_0m_ann 2.546665
bathy_sd 2.059610
vostr_mean 1.952209
mld_mean 1.619756
uo_mean 1.510746
vo_mean 1.324461
pred_var 0.720020
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 10 bathy_mean 3 sal_mean 726.81
2 14 AGI_60m_ann 10 bathy_mean 459.19
3 13 AGI_0m_ann 2 temp_mean 378.68
4 15 AGI_250m_ann 2 temp_mean 273.55
5 13 AGI_0m_ann 3 sal_mean 218.74
6 10 bathy_mean 2 temp_mean 216.81
7 14 AGI_60m_ann 2 temp_mean 206.70
8 15 AGI_250m_ann 14 AGI_60m_ann 187.66
9 15 AGI_250m_ann 13 AGI_0m_ann 184.90
10 14 AGI_60m_ann 3 sal_mean 136.67
11 6 vo_mean 3 sal_mean 133.79
[1] "External percent deviance explained"
[1] 0.7612256
[1] "TPR"
[1] 0.742517
[1] "TSS"
[1] 0.8878601
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7900 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2125943 0.9072882 0.9855127 0.9933046 0.7612256 0.8103606
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_ann_Yspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.2360291
Correlation 0.9435098
AUC 0.9956000
Per.Expl 82.9739352
cvDeviance 0.5061817
cvCorrelation 0.8352547
cvAUC 0.9599800
cvPer.Expl 63.4863619
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 51.6411632
lat 8.4674358
AGI_60m_ann 6.0984294
bathy_mean 5.2557204
AGI_250m_ann 4.8594233
temp_mean 4.7777648
chl_mean 3.8187141
sal_mean 3.5230954
AGI_0m_ann 2.2155331
ssh_mean 2.0640587
mld_mean 1.3580563
vostr_mean 1.3000387
uo_mean 1.1310191
uostr_mean 1.0961652
bathy_sd 0.9743069
vo_mean 0.8236064
pred_var 0.5954694
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 16 AGI_60m_ann 11 bathy_mean 511.73
2 11 bathy_mean 3 temp_mean 409.15
3 6 uostr_mean 1 lat 217.27
4 3 temp_mean 1 lat 197.20
5 15 AGI_0m_ann 3 temp_mean 188.34
6 11 bathy_mean 1 lat 169.31
7 2 chl_mean 1 lat 168.59
8 4 sal_mean 1 lat 164.04
9 17 AGI_250m_ann 16 AGI_60m_ann 155.97
10 15 AGI_0m_ann 4 sal_mean 147.21
11 13 dist_coast 10 mld_mean 142.77
12 13 dist_coast 8 vostr_mean 121.93
13 11 bathy_mean 2 chl_mean 113.84
14 15 AGI_0m_ann 11 bathy_mean 110.48
[1] "External percent deviance explained"
[1] 0.781367
[1] "TPR"
[1] 0.7438418
[1] "TSS"
[1] 0.9010301
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7950 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2021057 0.9165818 0.9878297 0.9938869 0.781367 0.8297394
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.1865064
Correlation 0.9596064
AUC 0.9980000
Per.Expl 86.5462763
cvDeviance 0.4570929
cvCorrelation 0.8569081
cvAUC 0.9661000
cvPer.Expl 67.0274031
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 27.3057785
temp_mean 19.7302461
AGI_250m_seas 10.1427791
AGI_0m 6.7539420
uostr_mean 5.0843977
sal_mean 3.7797233
AGI_0m_seas 3.4398070
AGI_250m_ann 3.0656614
AGI_250m 2.9094675
AGI_60m_seas 2.6130226
chl_mean 2.2934217
AGI_60m_ann 2.2513231
ssh_mean 2.1152245
AGI_60m 1.3693592
bathy_sd 1.3626378
vostr_mean 1.2977303
AGI_0m_ann 1.1555551
uo_mean 1.0380480
vo_mean 0.9418376
mld_mean 0.9201980
pred_var 0.4298396
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m 2 temp_mean 3310.95
2 20 AGI_60m_ann 10 bathy_mean 357.98
3 19 AGI_0m_ann 16 AGI_0m_seas 305.39
4 16 AGI_0m_seas 6 vo_mean 284.87
5 10 bathy_mean 3 sal_mean 213.26
6 18 AGI_250m_seas 2 temp_mean 183.13
7 18 AGI_250m_seas 10 bathy_mean 172.38
8 12 AGI_0m 5 uostr_mean 170.11
9 19 AGI_0m_ann 10 bathy_mean 160.66
10 4 uo_mean 2 temp_mean 149.15
11 20 AGI_60m_ann 16 AGI_0m_seas 137.50
12 16 AGI_0m_seas 13 AGI_60m 136.91
13 12 AGI_0m 10 bathy_mean 134.92
14 10 bathy_mean 2 temp_mean 129.71
15 21 AGI_250m_ann 3 sal_mean 113.98
16 10 bathy_mean 4 uo_mean 111.18
17 19 AGI_0m_ann 11 bathy_sd 109.09
18 12 AGI_0m 3 sal_mean 106.63
19 5 uostr_mean 2 temp_mean 94.19
20 12 AGI_0m 8 ssh_mean 86.97
21 18 AGI_250m_seas 14 AGI_250m 85.14
22 13 AGI_60m 10 bathy_mean 83.26
[1] "External percent deviance explained"
[1] 0.8180819
[1] "TPR"
[1] 0.7460924
[1] "TSS"
[1] 0.9231004
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8350 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1806695 0.9341654 0.9917645 0.9944876 0.8180819 0.8654628
explore_brt (mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862811
Residual.Deviance 0.1794387
Correlation 0.9613569
AUC 0.9982000
Per.Expl 87.0561087
cvDeviance 0.4428947
cvCorrelation 0.8609286
cvAUC 0.9681400
cvPer.Expl 68.0515947
[1] "Relative influence of predictor variables"
rel.inf
dist_coast 50.6297312
lat 7.4773131
AGI_0m 5.8649772
AGI_60m_ann 4.3063983
bathy_mean 3.7610000
temp_mean 3.3346837
AGI_0m_seas 3.0194786
AGI_250m_seas 3.0002579
sal_mean 2.3172896
chl_mean 2.2989785
AGI_60m_seas 1.9360137
AGI_250m_ann 1.6666857
AGI_250m 1.5038736
ssh_mean 1.3536386
AGI_0m_ann 1.2578651
AGI_60m 1.1737055
vostr_mean 0.9598372
mld_mean 0.8927208
uo_mean 0.8547717
uostr_mean 0.7902823
vo_mean 0.6354592
bathy_sd 0.6025325
pred_var 0.3625061
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 AGI_0m 3 temp_mean 868.66
2 13 AGI_0m 1 lat 559.83
3 22 AGI_60m_ann 11 bathy_mean 478.77
4 21 AGI_0m_ann 18 AGI_0m_seas 388.44
5 18 AGI_0m_seas 7 vo_mean 227.09
6 13 AGI_0m 11 bathy_mean 216.84
7 14 dist_coast 10 mld_mean 212.25
8 20 AGI_250m_seas 11 bathy_mean 183.69
9 21 AGI_0m_ann 11 bathy_mean 174.45
10 18 AGI_0m_seas 15 AGI_60m 169.88
11 14 dist_coast 8 vostr_mean 150.05
12 6 uostr_mean 1 lat 144.30
13 22 AGI_60m_ann 18 AGI_0m_seas 105.52
14 15 AGI_60m 11 bathy_mean 78.87
15 3 temp_mean 1 lat 68.14
16 14 dist_coast 1 lat 67.04
17 19 AGI_60m_seas 13 AGI_0m 66.47
18 23 AGI_250m_ann 4 sal_mean 62.44
19 11 bathy_mean 2 chl_mean 62.31
20 11 bathy_mean 3 temp_mean 61.34
21 23 AGI_250m_ann 9 ssh_mean 61.03
22 22 AGI_60m_ann 4 sal_mean 57.74
23 13 AGI_0m 6 uostr_mean 57.31
24 11 bathy_mean 9 ssh_mean 56.58
25 23 AGI_250m_ann 22 AGI_60m_ann 45.98
26 13 AGI_0m 9 ssh_mean 43.15
[1] "External percent deviance explained"
[1] 0.8241134
[1] "TPR"
[1] 0.7463558
[1] "TSS"
[1] 0.9231764
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8150 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1777092 0.9362438 0.992061 0.9948892 0.8241134 0.8705611
Summary table of results
output_sum_seas_ann <- read.csv (here ("data/brt/mod_outputs/brt_background_seas_ann_output_summary.csv" ))
kableExtra:: kable (output_sum)
base_0m_Nspat_Ntag
78.734
0.724
0.739
0.870
0.979
0.231
0.888
0.787
base_0m_Nspat_Ytag
92.976
0.876
0.761
0.961
0.994
0.141
0.960
0.930
base_0m_Yspat_Ytag
93.544
0.887
0.770
0.964
0.995
0.125
0.963
0.935
do_0m_Nspat_Ytag
94.201
0.901
0.772
0.971
0.996
0.124
0.969
0.942
do_0m_Yspat_Ytag
95.618
0.920
0.788
0.977
0.997
0.110
0.976
0.956
do_0m_60m_Nspat_Ytag
94.865
0.908
0.775
0.973
0.997
0.119
0.972
0.949
do_0m_250m_Nspat_Ytag
95.069
0.909
0.783
0.974
0.996
0.119
0.972
0.951
do_0m_60m_250m_Nspat_Ytag
95.132
0.913
0.783
0.976
0.997
0.116
0.973
0.951
do_0m_60m_250m_Yspat_Ytag
95.186
0.918
0.784
0.977
0.997
0.113
0.975
0.952
agi_0m_Nspat_Ytag
93.845
0.901
0.765
0.971
0.997
0.124
0.970
0.938
agi_0m_Yspat_Ytag
94.754
0.916
0.776
0.975
0.998
0.114
0.974
0.948
agi_0m_60m_Nspat_Ytag
94.548
0.908
0.765
0.973
0.997
0.119
0.972
0.945
agi_0m_250m_Nspat_Ytag
93.059
0.897
0.767
0.967
0.997
0.129
0.967
0.931
agi_0m_60m_250m_Nspat_Ytag
94.111
0.907
0.767
0.972
0.997
0.122
0.971
0.941
agi_0m_60m_250m_Yspat_Ytag
95.406
0.920
0.777
0.976
0.998
0.111
0.975
0.954
output_sum_seas_ann_Nspat <- output_sum_seas_ann %>%
filter (! grepl ("Yspat" , model))
ggplot (output_sum_seas_ann_Nspat, aes (AUC, TSS, color = deviance_exp, label = model)) +
geom_point (size = 5 ) +
xlab ('AUC' ) +
ylab ('TSS' ) +
scale_color_gradientn (colors = MetBrewer:: met.brewer ("Greek" )) +
ggrepel:: geom_label_repel (aes (label = model),
box.padding = 0.35 ,
point.padding = 0.5 ,
segment.color = 'grey50' ,
max.overlaps = 20 ,
label.size = 0.5 )
Conclusions from initial seasonal/annual models
Seasonal and annual base models performed better than the daily resolution base models, with the annual base model performing better than the seasonal one.
The DO and AGI models with all depth layers and temporal resolutions were by far the best performing and had nearly identical scores across evaluation metrics. The models that also included spatial predictors also performed slightly better than those without, but were still fairly comparable.
For the DO model with all temporal resolutions, the top predictor variables with the highest relative importance were bathymetry and DO_0m_daily. The next variables that have considerably lower values are DO_250m_seasonal and DO_0m_seasonal. Partial plots follow similar trends as previously described.
For the AGI model with all temporal resolutions, bathymetry and temperature were the two predictors with the highest relative influence. The next variables that have considerably lower values are AGI_250m_seasonal and AGI_0m_seasonal.
Model fine-tuning and selection
Here, I take the two best performing models from the above sections (agi and do with all depths and temporal resolutions without tag ID or spatial variables as predictors) to be used as overfit reference models. The following model options excluded the wind predictors as these consistently had lower relative importance than the random predictor variable we included. I also included a combo model that uses information about AGI at 250 m and DO at 0m across temporal resolutions. Lastly, the final models also remove do/agi at 60m and at a seasonal resolution, as these were typically the vars with the lowest predictive performance relative to the other depth layers and resolutions.
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_base_0m_dail_no_wind.rds" ,
test_data = base_test_daily)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862741
Residual.Deviance 0.3293149
Correlation 0.9117897
AUC 0.9883000
Per.Expl 76.2446050
cvDeviance 0.6050895
cvCorrelation 0.7960592
cvAUC 0.9439300
cvPer.Expl 56.3513832
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 39.888522
temp_mean 27.784602
sal_mean 8.291711
ssh_mean 7.954539
chl_mean 7.177538
bathy_sd 3.876379
mld_mean 3.252551
pred_var 1.774158
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 6 bathy_mean 2 temp_mean 984.83
2 4 ssh_mean 2 temp_mean 822.97
3 6 bathy_mean 4 ssh_mean 815.40
[1] "External percent deviance explained"
[1] 0.7059102
[1] "TPR"
[1] 0.7375997
[1] "TSS"
[1] 0.8553636
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4250 iterations were performed.
There were 8 predictors of which 8 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2402275 0.8787054 0.9759937 1.000154 0.7059102 0.7624461
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_60m_250m_dail_seas_ann_no_wind.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862244
Residual.Deviance 0.2107719
Correlation 0.9504410
AUC 0.9963000
Per.Expl 84.7952550
cvDeviance 0.4559794
cvCorrelation 0.8564559
cvAUC 0.9667500
cvPer.Expl 67.1063781
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 26.7211943
o2_mean_0m 20.7583985
o2_mean_0m_seas 9.2882523
o2_mean_250m_seas 8.6714061
o2_mean_250m_ann 5.7739633
o2_mean_60m_seas 4.8036871
o2_mean_0m_ann 3.1994781
o2_mean_60m_ann 3.1542770
chl_mean 2.9571143
temp_mean 2.8154229
ssh_mean 2.5370378
sal_mean 2.1503543
o2_mean_250m 2.1428240
o2_mean_60m 1.9018201
bathy_sd 1.2755329
mld_mean 1.1979952
pred_var 0.6512417
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 524.36
2 16 o2_mean_60m_ann 7 bathy_mean 314.25
3 9 o2_mean_60m 7 bathy_mean 225.85
4 2 chl_mean 1 o2_mean_0m 218.47
5 14 o2_mean_250m_seas 10 o2_mean_250m 204.52
6 7 bathy_mean 5 ssh_mean 184.82
7 4 sal_mean 3 temp_mean 163.93
8 14 o2_mean_250m_seas 1 o2_mean_0m 144.74
9 16 o2_mean_60m_ann 13 o2_mean_60m_seas 126.52
10 7 bathy_mean 4 sal_mean 120.32
11 7 bathy_mean 2 chl_mean 109.90
12 12 o2_mean_0m_seas 3 temp_mean 98.20
13 14 o2_mean_250m_seas 2 chl_mean 97.60
14 9 o2_mean_60m 3 temp_mean 89.49
[1] "External percent deviance explained"
[1] 0.7936232
[1] "TPR"
[1] 0.7443195
[1] "TSS"
[1] 0.9059035
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7500 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1963482 0.9210228 0.9883101 0.9952783 0.7936232 0.8479526
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_60m_250m_dail_seas_ann_no_wind.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862598
Residual.Deviance 0.1725199
Correlation 0.9639600
AUC 0.9983000
Per.Expl 87.5550071
cvDeviance 0.4451753
cvCorrelation 0.8607369
cvAUC 0.9680800
cvPer.Expl 67.8865900
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 29.6072816
temp_mean 21.4544712
AGI_250m_seas 9.0847918
AGI_0m 7.0179705
AGI_0m_seas 4.0172439
sal_mean 4.0041197
AGI_250m_ann 3.7156705
ssh_mean 3.5991848
chl_mean 2.8199131
AGI_60m_ann 2.7817580
AGI_250m 2.7006117
AGI_60m_seas 2.6378381
AGI_0m_ann 1.7398823
AGI_60m 1.5526407
bathy_sd 1.4789096
mld_mean 1.0854342
pred_var 0.7022784
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 AGI_0m 2 temp_mean 3569.24
2 15 AGI_0m_ann 12 AGI_0m_seas 661.16
3 6 bathy_mean 3 sal_mean 530.25
4 16 AGI_60m_ann 12 AGI_0m_seas 279.96
5 16 AGI_60m_ann 6 bathy_mean 246.01
6 17 AGI_250m_ann 3 sal_mean 210.57
7 8 AGI_0m 4 ssh_mean 192.50
8 14 AGI_250m_seas 2 temp_mean 176.08
9 3 sal_mean 2 temp_mean 173.66
10 6 bathy_mean 2 temp_mean 166.93
11 8 AGI_0m 6 bathy_mean 141.44
12 12 AGI_0m_seas 2 temp_mean 131.43
13 14 AGI_250m_seas 10 AGI_250m 125.88
14 9 AGI_60m 6 bathy_mean 125.25
[1] "External percent deviance explained"
[1] 0.8155907
[1] "TPR"
[1] 0.745645
[1] "TSS"
[1] 0.9236973
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
9350 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1819719 0.9327983 0.9904085 0.9914034 0.8155907 0.8755501
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_250_DO_0_dail_seas_ann.rds" ,
test_data = readRDS (here ("data/brt/mod_eval/back/agi_do_test_daily_seasonal_annual.rds" )))
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862598
Residual.Deviance 0.2964633
Correlation 0.9232444
AUC 0.9914000
Per.Expl 78.6141629
cvDeviance 0.5650248
cvCorrelation 0.8127604
cvAUC 0.9505600
cvPer.Expl 59.2410582
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 31.8590421
temp_mean 22.5046246
AGI_250m_seas 11.8954341
ssh_mean 6.1891620
sal_mean 5.9776546
AGI_250m_ann 4.9215220
chl_mean 4.7262925
AGI_250m 3.4196880
bathy_sd 2.3470344
mld_mean 1.9464328
o2_mean_0m 1.1864430
pred_var 1.0943611
o2_mean_0m_seas 1.0053067
o2_mean_0m_ann 0.9270022
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 6 bathy_mean 2 temp_mean 658.05
2 10 AGI_250m_seas 3 sal_mean 518.36
3 6 bathy_mean 3 sal_mean 489.43
4 10 AGI_250m_seas 6 bathy_mean 375.70
5 10 AGI_250m_seas 8 AGI_250m 338.48
6 3 sal_mean 2 temp_mean 268.35
7 11 AGI_250m_ann 3 sal_mean 237.76
8 10 AGI_250m_seas 2 temp_mean 236.78
9 4 ssh_mean 3 sal_mean 212.26
10 2 temp_mean 1 chl_mean 202.19
[1] "External percent deviance explained"
[1] 0.7169734
[1] "TPR"
[1] 0.7384845
[1] "TSS"
[1] 0.8584466
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7150 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2362547 0.8827105 0.9775864 0.9920047 0.7169734 0.7861416
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_250m_dail_seas_ann.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862244
Residual.Deviance 0.2156559
Correlation 0.9488750
AUC 0.9962000
Per.Expl 84.4429321
cvDeviance 0.4735685
cvCorrelation 0.8497414
cvAUC 0.9644500
cvPer.Expl 65.8375303
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 28.1527411
o2_mean_0m 21.5616098
o2_mean_250m_seas 12.6044992
o2_mean_0m_seas 8.9700418
o2_mean_250m_ann 6.4551558
o2_mean_0m_ann 3.9034438
temp_mean 3.5445843
chl_mean 3.2837754
sal_mean 2.6484189
ssh_mean 2.6018368
o2_mean_250m 2.3095329
bathy_sd 1.6979623
mld_mean 1.4124722
pred_var 0.8539255
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 738.70
2 12 o2_mean_250m_seas 9 o2_mean_250m 485.27
3 12 o2_mean_250m_seas 1 o2_mean_0m 272.32
4 7 bathy_mean 4 sal_mean 267.15
5 11 o2_mean_0m_seas 5 ssh_mean 246.13
6 2 chl_mean 1 o2_mean_0m 214.51
7 7 bathy_mean 3 temp_mean 202.68
8 7 bathy_mean 5 ssh_mean 194.75
9 12 o2_mean_250m_seas 4 sal_mean 193.45
10 11 o2_mean_0m_seas 7 bathy_mean 154.83
[1] "External percent deviance explained"
[1] 0.7862912
[1] "TPR"
[1] 0.7440081
[1] "TSS"
[1] 0.8961138
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8000 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2012221 0.9167496 0.9876593 0.99309 0.7862912 0.8444293
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_60m_250m_dail_ann.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862244
Residual.Deviance 0.2469826
Correlation 0.9381203
AUC 0.9940000
Per.Expl 82.1830688
cvDeviance 0.4811882
cvCorrelation 0.8472799
cvAUC 0.9631900
cvPer.Expl 65.2878598
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 29.8405664
bathy_mean 27.7680179
o2_mean_250m_ann 11.8978532
o2_mean_60m_ann 5.1904049
o2_mean_60m 3.8917200
o2_mean_0m_ann 3.5918036
temp_mean 3.0815911
o2_mean_250m 3.0462971
chl_mean 2.9433569
ssh_mean 2.7443431
sal_mean 2.4896773
bathy_sd 1.4784883
mld_mean 1.2964225
pred_var 0.7394577
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 657.11
2 2 chl_mean 1 o2_mean_0m 284.72
3 9 o2_mean_60m 7 bathy_mean 196.18
4 7 bathy_mean 5 ssh_mean 188.15
5 13 o2_mean_60m_ann 7 bathy_mean 171.04
6 14 o2_mean_250m_ann 1 o2_mean_0m 151.38
7 7 bathy_mean 4 sal_mean 135.20
8 9 o2_mean_60m 3 temp_mean 126.86
9 5 ssh_mean 4 sal_mean 119.71
10 13 o2_mean_60m_ann 2 chl_mean 89.85
[1] "External percent deviance explained"
[1] 0.768806
[1] "TPR"
[1] 0.7426928
[1] "TSS"
[1] 0.8915028
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6700 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2098619 0.9090659 0.9853852 0.9942654 0.768806 0.8218307
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_60m_250m_seas_ann.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862244
Residual.Deviance 0.2301077
Correlation 0.9440084
AUC 0.9953000
Per.Expl 83.4004030
cvDeviance 0.4774268
cvCorrelation 0.8484529
cvAUC 0.9637600
cvPer.Expl 65.5591946
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 26.2909628
o2_mean_0m_seas 25.5213284
o2_mean_250m_seas 9.3451185
o2_mean_250m_ann 6.8074128
o2_mean_0m_ann 5.4627297
o2_mean_60m_seas 5.4134035
ssh_mean 4.1943198
o2_mean_60m_ann 3.8284725
chl_mean 3.5727046
temp_mean 3.3083960
sal_mean 2.4888076
mld_mean 1.4816417
bathy_sd 1.4393027
pred_var 0.8453993
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 13 o2_mean_60m_ann 10 o2_mean_60m_seas 320.41
2 6 bathy_mean 4 ssh_mean 263.16
3 13 o2_mean_60m_ann 6 bathy_mean 259.41
4 13 o2_mean_60m_ann 2 temp_mean 240.17
5 10 o2_mean_60m_seas 3 sal_mean 200.63
6 11 o2_mean_250m_seas 6 bathy_mean 198.02
7 14 o2_mean_250m_ann 11 o2_mean_250m_seas 186.32
8 6 bathy_mean 3 sal_mean 182.04
9 9 o2_mean_0m_seas 2 temp_mean 141.21
10 9 o2_mean_0m_seas 4 ssh_mean 133.99
[1] "External percent deviance explained"
[1] 0.7788946
[1] "TPR"
[1] 0.7433807
[1] "TSS"
[1] 0.8905543
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7700 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2052818 0.9131417 0.986685 0.9949934 0.7788946 0.834004
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_250m_dail_ann.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862928
Residual.Deviance 0.2531001
Correlation 0.9357576
AUC 0.9936000
Per.Expl 81.7426681
cvDeviance 0.4853877
cvCorrelation 0.8455465
cvAUC 0.9626300
cvPer.Expl 64.9866397
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 29.2429111
o2_mean_0m 29.1971826
o2_mean_250m_ann 13.0192739
o2_mean_250m 5.9098611
o2_mean_0m_ann 5.4067270
temp_mean 3.6750228
chl_mean 3.6643218
ssh_mean 2.9662351
sal_mean 2.8777805
bathy_sd 1.7549023
mld_mean 1.4552967
pred_var 0.8304848
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 3 temp_mean 1 o2_mean_0m 982.23
2 2 chl_mean 1 o2_mean_0m 524.84
3 5 ssh_mean 4 sal_mean 240.01
4 7 bathy_mean 3 temp_mean 238.90
5 7 bathy_mean 1 o2_mean_0m 169.52
6 7 bathy_mean 5 ssh_mean 158.36
7 12 o2_mean_250m_ann 1 o2_mean_0m 153.57
[1] "External percent deviance explained"
[1] 0.7734847
[1] "TPR"
[1] 0.7432465
[1] "TSS"
[1] 0.8943853
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6900 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2069449 0.9120116 0.9864721 0.9965555 0.7734847 0.8174267
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_250m_dail_ann_refined.rds" ,
test_data = do_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862921
Residual.Deviance 0.2562984
Correlation 0.9343060
AUC 0.9933000
Per.Expl 81.5119511
cvDeviance 0.4911880
cvCorrelation 0.8423507
cvAUC 0.9622600
cvPer.Expl 64.5682173
[1] "Relative influence of predictor variables"
rel.inf
o2_mean_0m 31.0126981
bathy_mean 30.2670573
o2_mean_250m_ann 19.1376868
temp_mean 4.2824471
chl_mean 4.0099421
ssh_mean 3.4992903
sal_mean 3.3314775
bathy_sd 1.8989334
mld_mean 1.6136901
pred_var 0.9467773
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 7 bathy_mean 4 sal_mean 631.44
2 3 temp_mean 1 o2_mean_0m 563.67
3 7 bathy_mean 5 ssh_mean 530.48
4 7 bathy_mean 3 temp_mean 475.02
5 10 o2_mean_250m_ann 1 o2_mean_0m 432.90
[1] "External percent deviance explained"
[1] 0.7534683
[1] "TPR"
[1] 0.7412716
[1] "TSS"
[1] 0.8824682
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7300 iterations were performed.
There were 10 predictors of which 10 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2179116 0.9012468 0.9824931 0.9903099 0.7534683 0.8151195
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_250m_dail_seas_ann.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862598
Residual.Deviance 0.2178641
Correlation 0.9487621
AUC 0.9963000
Per.Expl 84.2840333
cvDeviance 0.4666383
cvCorrelation 0.8522336
cvAUC 0.9651100
cvPer.Expl 66.3383249
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 31.0327439
temp_mean 21.5126289
AGI_250m_seas 10.5932242
AGI_0m 7.2883846
ssh_mean 5.4945038
sal_mean 4.4691331
AGI_0m_seas 4.2234033
AGI_250m_ann 3.4503058
chl_mean 3.1456337
AGI_250m 2.9512860
AGI_0m_ann 1.9547153
bathy_sd 1.8211027
mld_mean 1.2416308
pred_var 0.8213038
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 AGI_0m 2 temp_mean 4018.99
2 6 bathy_mean 3 sal_mean 336.19
3 12 AGI_250m_seas 3 sal_mean 317.17
4 6 bathy_mean 2 temp_mean 283.99
5 12 AGI_250m_seas 9 AGI_250m 279.87
6 13 AGI_0m_ann 11 AGI_0m_seas 264.70
7 14 AGI_250m_ann 3 sal_mean 214.18
8 3 sal_mean 2 temp_mean 207.16
9 12 AGI_250m_seas 6 bathy_mean 201.20
10 8 AGI_0m 4 ssh_mean 190.66
[1] "External percent deviance explained"
[1] 0.7826156
[1] "TPR"
[1] 0.7437599
[1] "TSS"
[1] 0.899488
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7850 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2020269 0.9161384 0.9870229 0.991535 0.7826156 0.8428403
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_60m_250m_dail_ann.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862598
Residual.Deviance 0.2015300
Correlation 0.9547506
AUC 0.9971000
Per.Expl 85.4623207
cvDeviance 0.4709168
cvCorrelation 0.8515545
cvAUC 0.9644900
cvPer.Expl 66.0296891
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 30.2583827
temp_mean 23.1001159
AGI_250m_ann 9.5767117
AGI_0m 8.4094393
AGI_250m 4.6324992
sal_mean 4.1360932
AGI_60m_ann 3.8957028
ssh_mean 3.7773277
chl_mean 3.2338520
AGI_60m 2.5267217
AGI_0m_ann 2.3370542
bathy_sd 1.8222966
mld_mean 1.4317833
pred_var 0.8620197
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 AGI_0m 2 temp_mean 5062.15
2 6 bathy_mean 3 sal_mean 417.84
3 13 AGI_60m_ann 6 bathy_mean 270.08
4 9 AGI_60m 6 bathy_mean 230.29
5 6 bathy_mean 2 temp_mean 218.57
6 8 AGI_0m 4 ssh_mean 213.78
7 14 AGI_250m_ann 3 sal_mean 180.54
8 3 sal_mean 2 temp_mean 180.39
9 12 AGI_0m_ann 2 temp_mean 179.81
10 12 AGI_0m_ann 8 AGI_0m 179.68
[1] "External percent deviance explained"
[1] 0.7906133
[1] "TPR"
[1] 0.7439517
[1] "TSS"
[1] 0.9084231
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8900 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1966226 0.9207726 0.9876523 0.9929067 0.7906133 0.8546232
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_60m_250m_seas_ann.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862598
Residual.Deviance 0.2061333
Correlation 0.9539635
AUC 0.9971000
Per.Expl 85.1302577
cvDeviance 0.4834310
cvCorrelation 0.8450470
cvAUC 0.9630500
cvPer.Expl 65.1269532
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 30.490435
temp_mean 22.634626
AGI_250m_seas 10.000824
AGI_250m_ann 5.796857
AGI_0m_seas 5.718093
sal_mean 4.723935
ssh_mean 3.599499
AGI_60m_seas 3.530415
AGI_60m_ann 3.460432
chl_mean 3.419888
AGI_0m_ann 2.194834
bathy_sd 1.832470
mld_mean 1.671970
pred_var 0.925720
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 12 AGI_0m_ann 9 AGI_0m_seas 546.51
2 6 bathy_mean 3 sal_mean 522.64
3 14 AGI_250m_ann 3 sal_mean 322.26
4 13 AGI_60m_ann 9 AGI_0m_seas 295.22
5 6 bathy_mean 2 temp_mean 255.75
6 11 AGI_250m_seas 3 sal_mean 251.07
7 3 sal_mean 2 temp_mean 236.73
8 10 AGI_60m_seas 2 temp_mean 234.74
9 11 AGI_250m_seas 2 temp_mean 226.67
10 13 AGI_60m_ann 6 bathy_mean 207.59
[1] "External percent deviance explained"
[1] 0.786909
[1] "TPR"
[1] 0.7439235
[1] "TSS"
[1] 0.904135
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
9250 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.1992057 0.9186471 0.9874306 0.9937912 0.786909 0.8513026
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_250m_dail_ann.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862896
Residual.Deviance 0.2468066
Correlation 0.9390492
AUC 0.9942000
Per.Expl 82.1966038
cvDeviance 0.4884958
cvCorrelation 0.8444165
cvAUC 0.9622600
cvPer.Expl 64.7623530
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 32.4609992
temp_mean 21.8560386
AGI_250m_ann 10.9191700
AGI_0m 8.6490291
ssh_mean 6.0997823
sal_mean 4.9030145
AGI_250m 4.5800341
chl_mean 3.3793154
AGI_0m_ann 2.5673111
bathy_sd 2.3310006
mld_mean 1.4227914
pred_var 0.8315138
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 AGI_0m 2 temp_mean 5180.90
2 6 bathy_mean 3 sal_mean 323.30
3 11 AGI_0m_ann 6 bathy_mean 282.31
4 8 AGI_0m 6 bathy_mean 275.45
5 8 AGI_0m 4 ssh_mean 259.20
6 11 AGI_0m_ann 2 temp_mean 240.76
7 12 AGI_250m_ann 3 sal_mean 217.87
[1] "External percent deviance explained"
[1] 0.7734026
[1] "TPR"
[1] 0.7430413
[1] "TSS"
[1] 0.8971213
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7550 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.2058276 0.9130484 0.9860147 0.9941768 0.7734026 0.821966
explore_brt (mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_250m_dail_ann_refined.rds" ,
test_data = agi_test_daily_seasonal_annual)
[1] "Model performance metrics"
Model 1
Total.Deviance 1.3862418
Residual.Deviance 0.2519479
Correlation 0.9366038
AUC 0.9937000
Per.Expl 81.8251152
cvDeviance 0.4925116
cvCorrelation 0.8431762
cvAUC 0.9617600
cvPer.Expl 64.4714538
[1] "Relative influence of predictor variables"
rel.inf
bathy_mean 33.026420
temp_mean 22.293844
AGI_250m_ann 14.285529
AGI_0m 9.392833
ssh_mean 6.005079
sal_mean 5.870879
chl_mean 4.038303
bathy_sd 2.398052
mld_mean 1.658957
pred_var 1.030105
[1] "Partial plots"
[1] "Top most important pairwise interactions as identified by the model"
var1.index var1.names var2.index var2.names int.size
1 8 AGI_0m 2 temp_mean 5805.65
2 6 bathy_mean 3 sal_mean 615.53
3 8 AGI_0m 4 ssh_mean 349.75
4 10 AGI_250m_ann 2 temp_mean 334.64
5 10 AGI_250m_ann 3 sal_mean 297.82
[1] "External percent deviance explained"
[1] 0.747466
[1] "TPR"
[1] 0.7407371
[1] "TSS"
[1] 0.8782973
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family),
data = x.data, weights = site.weights, var.monotone = var.monotone,
n.trees = target.trees, interaction.depth = tree.complexity,
shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7600 iterations were performed.
There were 10 predictors of which 10 had non-zero influence.
RMSE Cor C-index PredRatio DevianceExplained PseudoR2
1 0.220844 0.8980987 0.9814319 0.9976299 0.747466 0.8182512
Summary table of results
output_sum_refined <- read.csv (here ("data/brt/mod_outputs/brt_bckg_refined_output_summary.csv" ))
kableExtra:: kable (output_sum_refined)
brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag
85.922
0.816
0.746
0.926
0.991
0.182
0.933
0.859
brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag
86.546
0.818
0.746
0.923
0.992
0.181
0.934
0.865
base_0m_daily_Nspat_Ntag
78.734
0.724
0.739
0.870
0.979
0.231
0.888
0.787
do_0m_daily_Nspat_Ntag
83.930
0.785
0.744
0.906
0.987
0.199
0.919
0.839
agi_0m_daily_Nspat_Ntag
81.982
0.775
0.743
0.903
0.987
0.204
0.915
0.820
brt_base_0m_dail_no_wind
76.245
0.706
0.738
0.855
0.976
0.240
0.879
0.762
brt_do_0m_60m_250m_dail_seas_ann_no_wind
84.795
0.794
0.744
0.906
0.988
0.196
0.921
0.848
brt_agi_0m_60m_250m_dail_seas_ann_no_wind
87.555
0.816
0.746
0.924
0.990
0.182
0.933
0.876
brt_agi_250_do_0_dail_seas_ann
78.614
0.717
0.738
0.858
0.978
0.236
0.883
0.786
brt_do_0m_250m_dail_seas_ann
84.443
0.786
0.744
0.896
0.988
0.201
0.917
0.844
brt_do_0m_60m_250m_dail_ann
82.183
0.769
0.743
0.891
0.985
0.210
0.909
0.821
brt_do_0m_60m_250m_seas_ann
83.400
0.779
0.743
0.891
0.987
0.205
0.913
0.834
brt_do_0m_250m_dail_ann
81.743
0.773
0.743
0.894
0.986
0.207
0.912
0.817
brt_do_0m_250m_dail_ann_refined
81.512
0.753
0.741
0.882
0.982
0.218
0.901
0.815
brt_agi_0m_250m_dail_seas_ann
84.284
0.783
0.744
0.899
0.987
0.202
0.916
0.843
brt_agi_0m_60m_250m_dail_ann
85.462
0.791
0.744
0.908
0.988
0.197
0.921
0.855
brt_agi_0m_60m_250m_seas_ann
85.130
0.787
0.744
0.904
0.987
0.199
0.919
0.851
brt_agi_0m_250m_dail_ann
82.197
0.773
0.743
0.897
0.986
0.206
0.913
0.822
brt_agi_0m_250m_dail_ann_refined
81.825
0.747
0.740
0.878
0.981
0.221
0.898
0.818
ggplot (output_sum_refined, aes (AUC, TSS, color = deviance_exp, label = model)) +
geom_point (size = 5 ) +
xlab ('AUC' ) +
ylab ('TSS' ) +
scale_color_gradientn (colors = MetBrewer:: met.brewer ("Greek" )) +
ggrepel:: geom_label_repel (aes (label = model),
box.padding = 0.35 ,
point.padding = 0.5 ,
segment.color = 'grey50' ,
max.overlaps = 20 ,
label.size = 0.5 )
Conclusions from refined mdoels